DOCX获取注释引用的字符串

时间:2019-07-25 11:28:09

标签: python xml openxml docx

以此为基础Extract DOCX Comments

from lxml import etree
import zipfile

ooXMLns = {'w':'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}

def get_comments(docxFileName):
  docxZip = zipfile.ZipFile(docxFileName)
  commentsXML = docxZip.read('word/comments.xml')
  et = etree.XML(commentsXML)
  comments = et.xpath('//w:comment',namespaces=ooXMLns)
  for c in comments:
    # attributes:
    print(c.xpath('@w:author',namespaces=ooXMLns))
    print(c.xpath('@w:date',namespaces=ooXMLns))
    # string value of the comment:
    print(c.xpath('string(.)',namespaces=ooXMLns))

如何获取注释所引用的文本?为了清楚起见,我试图提取已注释的文本,而不是注释本身正文中的文本

0 个答案:

没有答案
相关问题