无法从Word

时间:2018-09-21 15:55:44

标签: java docx4j

除非该单元格具有内容控制(下拉菜单),否则我能够使用表/行/单元格解析Word文档并从单元格中获取文本。如果存在内容控件,则不会拉出任何内容。我已经测试过它试图用Text.class或Tc.class来获取任何东西,即使它是其XML块的一部分,也没有看到它。

我研究了docx4j.wml中的类类型,并尝试了几种我认为合适的类。 CTSdtCell正在查找我需要的代码块,但并没有做很多事情。

从输出中查找sdt内容,而不是其中的单元格(w:tc)。如果找不到单元格,则不会找到文本(w:t)

该文档有九行。我从前两行中删除了所有内容控件,而其余七个保持不变。当它到达带有内容控件的行时,它不会将其视为一个单元格(w:tc),而只是其中没有任何单元格的内容控件(w:sdt)。

import java.io.File;
import java.util.List;

import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.CTSdtCell;
import org.docx4j.wml.Tc;
import org.docx4j.wml.Tr;

public class ReadWordDocTest implements Utilities {

    private static final String OUTLOOK_DOC_PATH = System.getProperty("user.home") + "\\workspace\\Test\\Projects\\";

    public static void main(String[] args) throws Exception {

        new ReadWordDocTest();

    }

    public ReadWordDocTest() throws Exception {

        String documentFilename = ("ATL.docx");

        WordprocessingMLPackage mlp = WordprocessingMLPackage.load(new File(OUTLOOK_DOC_PATH + documentFilename));
        MainDocumentPart mdp = mlp.getMainDocumentPart();

        List<Object> rowsList = getAllElementFromObject(mdp, Tr.class);

        rowsList.subList(0,  2).clear();    // Header stuff. Skip.

        // Rows
        for (Object row : rowsList) {

            List<Object> cellsList = getAllElementFromObject(row, Tc.class);
            List<Object> sdtObjList = getAllElementFromObject(row, CTSdtCell.class);        

            System.out.println("Cells " + cellsList.size() + " Content control " + sdtObjList.size());

        }
    }
}

输出

Cells 7 Content control 0
Cells 7 Content control 0
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4

使用内容控件的单元格中的XML示例

<w:sdt xmlns:dsp="http://schemas.microsoft.com/office/drawing/2008/diagram" xmlns:cppr="http://schemas.microsoft.com/office/2006/coverPageProps" xmlns:odx="http://opendope.org/xpaths" xmlns:c14="http://schemas.microsoft.com/office/drawing/2007/8/2/chart" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:odgm="http://opendope.org/SmartArt/DataHierarchy" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:we="http://schemas.microsoft.com/office/webextensions/webextension/2010/11" xmlns:pvml="urn:schemas-microsoft-com:office:powerpoint" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:comp="http://schemas.openxmlformats.org/drawingml/2006/compatibility" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart" xmlns:xvml="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:oda="http://opendope.org/answers" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:odc="http://opendope.org/conditions" xmlns:cdr="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:odi="http://opendope.org/components" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas" xmlns:odq="http://opendope.org/questions" xmlns:wetp="http://schemas.microsoft.com/office/webextensions/taskpanes/2010/11" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid">
<w:sdtPr>
    <w:rPr>
        <w:sz w:val="18"/>
        <w:szCs w:val="18"/>
    </w:rPr>
    <w:id w:val="1239367024"/>
    <w:placeholder>
        <w:docPart w:val="059F92C89F2F410BB7231E2BAA981321"/>
    </w:placeholder>
    <w:date>
        <w:dateFormat w:val="M/d/yyyy"/>
        <w:lid w:val="en-US"/>
        <w:storeMappedDataAs w:val="dateTime"/>
        <w:calendar w:val="gregorian"/>
    </w:date>
</w:sdtPr>
<w:sdtContent>
    <w:tc>
        <w:tcPr>
            <w:tcW w:w="1170" w:type="dxa"/>
        </w:tcPr>
        <w:p w:rsidRPr="007D4D1F" w:rsidR="00040B4E" w:rsidP="00040B4E" w:rsidRDefault="00040B4E">
            <w:pPr>
                <w:ind w:left="0" w:firstLine="0"/>
                <w:jc w:val="center"/>
                <w:cnfStyle w:val="000000000000"/>
                <w:rPr>
                    <w:sz w:val="18"/>
                    <w:szCs w:val="18"/>
                </w:rPr>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:sz w:val="18"/>
                    <w:szCs w:val="18"/>
                </w:rPr>
                <w:t>02/01/2019</w:t>
            </w:r>
        </w:p>
    </w:tc>
</w:sdtContent>

界面中的方法

default List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {

    List<Object> result = new ArrayList<>();

    if (obj instanceof JAXBElement)

        obj = ((JAXBElement<?>) obj).getValue();

    if (obj.getClass().equals(toSearch)) {

        result.add(obj);

    } else if (obj instanceof ContentAccessor) {

        List<?> children = ((ContentAccessor) obj).getContent();

        for (Object child : children) {

            result.addAll(getAllElementFromObject(child, toSearch));
        }

    }

    return result;
}

1 个答案:

答案 0 :(得分:0)

根据JasonPlutext的响应,CTSdtCell不实现ContentAccessor。通过SdtElement进行路由,并使用其getSdtContent()方法。

class DynamicContentFragment : Fragment() {
    companion object {
        private const val KEY_LAYOUT_ID = "layoutId"
        fun instance(@LayoutRes layoutRes: Int) =
                DynamicContentFragment().apply {
                    arguments = Bundle().apply { putInt(KEY_LAYOUT_ID, layoutRes) }
                }
    }

    override fun onCreateView(inflater: LayoutInflater, container: ViewGroup?, savedInstanceState: Bundle?): View {
        val layout = arguments!!.getInt(KEY_LAYOUT_ID)!!
        return inflater.inflate(layout, container, false)
    }
}

class UseCase {
    fun test(fm: FragmentManager) {
        fm.beginTransaction()
                .replace(R.id.container, DynamicContentFragment.instance(R.layout.main))
    }
}
相关问题