如何在标签内添加标签外的文本。蟒蛇

时间:2019-03-18 08:11:29

标签: python xml beautifulsoup

在使用BeautifulSoap和Python解析XML时遇到了一个问题。

例如下面提供的XML内容的一部分:

xml = '''<line b="498" baseline="488" l="520" r="1248" t="456">
     <formatting lang="EnglishUnitedStates">
       $
      <price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">
       25.00
      </price>
      .
     </formatting>
    </line>'''

soup = BeautifulSoup(xml, "lxml-xml")
tag = soup.price
tag['data-value'] = "$25.00"
tag.string = "$25.00"
outside_tag_str = str(tag.find_parent())
new_outside_tag_str = outside_tag_str.replace("$<", "<")
print(new_outside_tag_str)


#prints: <formatting lang="EnglishUnitedStates">
      $
      <price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="$25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">$25.00</price>
      .
     </formatting>

如您所见,我想在标记中包含美元符号,但是我的代码返回在标记内外的美元。我做错了什么?我想退货:

<formatting lang="EnglishUnitedStates">
  <price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="$25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">$25.00</price>
  .
 </formatting>

任何帮助将不胜感激,谢谢。

1 个答案:

答案 0 :(得分:0)

Replace this:

new_outside_tag_str = outside_tag_str.replace("$<", "<")

with this:

new_outside_tag_str = outside_tag_str.replace(" $", "")

Why?

Because you need to replace exactly $ ( space followed by dollar sign) with an empty string, previously you were aiming for an invalid string.

Hence:

xml = '''<line b="498" baseline="488" l="520" r="1248" t="456">
     <formatting lang="EnglishUnitedStates">
       $
      <price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">
       25.00
      </price>
      .
     </formatting>
    </line>'''

soup = BeautifulSoup(xml, "lxml-xml")
tag = soup.price
tag['data-value'] = "$25.00"
tag.string = "$25.00"
outside_tag_str = str(tag.find_parent())
new_outside_tag_str = outside_tag_str.replace(" $", "")
print(new_outside_tag_str)

OUTPUT:

<formatting lang="EnglishUnitedStates">

      <price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="$25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">$25.00</price>
      .
     </formatting>