是否可以将字符偏移信息(对峙注释)转换为NLTK树结构?

时间:2019-06-10 05:13:45

标签: tree nltk offset brat webanno

根据this thread,似乎可以将NLTK树对象转换为隔离注释格式。反过来也会起作用吗? (即对峙注释格式-> NLTK树对象

例如,是否可以将(1)XMI文件转换为(2)下面的NLTK树对象?

(1)XMI(字符偏移量信息,用Webanno注释)

<?xml version="1.0" encoding="UTF-8"?><xmi:XMI xmlns:pos="http:///de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/pos.ecore" xmlns:tcas="http:///uima/tcas.ecore" xmlns:xmi="http://www.omg.org/XMI" xmlns:cas="http:///uima/cas.ecore" xmlns:tweet="http:///de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/pos/tweet.ecore" xmlns:morph="http:///de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/morph.ecore" xmlns:dependency="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type/dependency.ecore" xmlns:type5="http:///de/tudarmstadt/ukp/dkpro/core/api/semantics/type.ecore" xmlns:type7="http:///de/tudarmstadt/ukp/dkpro/core/api/transform/type.ecore" xmlns:type6="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type.ecore" xmlns:type2="http:///de/tudarmstadt/ukp/dkpro/core/api/metadata/type.ecore" xmlns:type3="http:///de/tudarmstadt/ukp/dkpro/core/api/ner/type.ecore" xmlns:type4="http:///de/tudarmstadt/ukp/dkpro/core/api/segmentation/type.ecore" xmlns:type="http:///de/tudarmstadt/ukp/dkpro/core/api/coref/type.ecore" xmlns:constituent="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type/constituent.ecore" xmlns:chunk="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type/chunk.ecore" xmi:version="2.0">
<cas:NULL xmi:id="0"/>
<type2:DocumentMetaData xmi:id="1" sofa="12" begin="0" end="28" language="x-unspecified" documentTitle="visualization-example2.txt" documentId="admin" documentUri="file:/C:/Users/Administrator/.webanno/repository/project/1/document/14/source/visualization-example2.txt" collectionId="file:/C:/Users/Administrator/.webanno/repository/project/1/document/14/source/" documentBaseUri="file:/C:/Users/Administrator/.webanno/repository/project/1/document/14/source/" isLastSegment="false"/>
<type4:Sentence xmi:id="19" sofa="12" begin="0" end="28"/>
<type4:Token xmi:id="23" sofa="12" begin="0" end="1"/>
<type4:Token xmi:id="32" sofa="12" begin="2" end="6"/>
<type4:Token xmi:id="41" sofa="12" begin="7" end="8"/>
<type4:Token xmi:id="50" sofa="12" begin="9" end="12"/>
<type4:Token xmi:id="59" sofa="12" begin="13" end="17"/>
<type4:Token xmi:id="68" sofa="12" begin="18" end="22"/>
<type4:Token xmi:id="77" sofa="12" begin="23" end="27"/>
<type4:Token xmi:id="86" sofa="12" begin="27" end="28"/>
<chunk:Chunk xmi:id="95" sofa="12" begin="0" end="1" chunkValue="SUB"/>
<chunk:Chunk xmi:id="100" sofa="12" begin="2" end="28" chunkValue="PRD"/>
<chunk:Chunk xmi:id="105" sofa="12" begin="2" end="6" chunkValue="VERB"/>
<chunk:Chunk xmi:id="110" sofa="12" begin="7" end="27" chunkValue="OBJ"/>
<chunk:Chunk xmi:id="115" sofa="12" begin="7" end="12" chunkValue="HED"/>
<chunk:Chunk xmi:id="120" sofa="12" begin="13" end="27" chunkValue="PP"/>
<type2:TagsetDescription xmi:id="125" sofa="12" begin="0" end="0" layer="de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency" name="UD Universal Dependencies"/>
<type2:TagsetDescription xmi:id="132" sofa="12" begin="0" end="0" layer="de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity" name="Named Entity tags"/>
<type2:TagsetDescription xmi:id="139" sofa="12" begin="0" end="0" layer="de.tudarmstadt.ukp.dkpro.core.api.transform.type.SofaChangeAnnotation" name="Operation"/>
<type2:TagsetDescription xmi:id="146" sofa="12" begin="0" end="0" layer="de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS" name="UD Universal POS tags"/>
<cas:Sofa xmi:id="12" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="I want a dog with long hair."/>
<cas:View sofa="12" members="1 19 23 32 41 50 59 68 77 86 95 100 105 110 115 120 125 132 139 146"/></xmi:XMI>

(2)NLTK树对象(手动构造,在其他示例中可能为m-ary tree

tree = Tree('S', [Tree('SUB', ['I']), Tree('PRD', [Tree('VERB', ['want']), Tree('OBJ', [Tree('HED', ['a dog']), Tree('PP', ['with long hair.'])])])])

我以前从未在计算机科学领域学习过数据结构,因此这对于解释为什么不可转换很有帮助。

0 个答案:

没有答案