在使用BeautifulSoap和Python解析XML时遇到了一个问题。
例如下面提供的XML内容的一部分:
xml = '''<line b="498" baseline="488" l="520" r="1248" t="456">
<formatting lang="EnglishUnitedStates">
$
<price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">
25.00
</price>
.
</formatting>
</line>'''
soup = BeautifulSoup(xml, "lxml-xml")
tag = soup.price
tag['data-value'] = "$25.00"
tag.string = "$25.00"
outside_tag_str = str(tag.find_parent())
new_outside_tag_str = outside_tag_str.replace("$<", "<")
print(new_outside_tag_str)
#prints: <formatting lang="EnglishUnitedStates">
$
<price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="$25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">$25.00</price>
.
</formatting>
如您所见,我想在标记中包含美元符号,但是我的代码返回在标记内外的美元。我做错了什么?我想退货:
<formatting lang="EnglishUnitedStates">
<price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="$25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">$25.00</price>
.
</formatting>
任何帮助将不胜感激,谢谢。
答案 0 :(得分:0)
Replace this:
new_outside_tag_str = outside_tag_str.replace("$<", "<")
with this:
new_outside_tag_str = outside_tag_str.replace(" $", "")
Why?
Because you need to replace exactly $
( space followed by dollar sign) with an empty string, previously you were aiming for an invalid string.
Hence:
xml = '''<line b="498" baseline="488" l="520" r="1248" t="456">
<formatting lang="EnglishUnitedStates">
$
<price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">
25.00
</price>
.
</formatting>
</line>'''
soup = BeautifulSoup(xml, "lxml-xml")
tag = soup.price
tag['data-value'] = "$25.00"
tag.string = "$25.00"
outside_tag_str = str(tag.find_parent())
new_outside_tag_str = outside_tag_str.replace(" $", "")
print(new_outside_tag_str)
OUTPUT:
<formatting lang="EnglishUnitedStates">
<price appendorder="10" class="extraction-tag" confidence="1/1" data-original-title="Price" data-toggle="tooltip" data-value="$25.00" style="background-color: rgb(192, 185, 178);" tagorder="10">$25.00</price>
.
</formatting>