Question

我有一个HTML代码段，如下所示：

<code class="inline">\n     object.__getattribute__\n    </code>\n    and\n    <code class="inline">\n     super.__getattribute__\n    </code>\n    peek\nin the\n    <code class="inline">\n     __dict__\n    </code>\n    of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n    <code class="inline">\n     __getdescriptor__\n    </code>\n    method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n    \n     super\n    </a>\n\n    \n    </a>\n    object.\n   </p>\n<p>\n    That is, the MRO walking loop in\n

问题

如何仅定位\n代码中的<code>？

我尝试了什么

我尝试使用re.sub()方法，但我一直在替换所有内容而不仅仅是\n代码

Answer 1

由于输入是HTML，为什么不使用专门的工具 - HTML解析器。

以下是有关如何找到所有code代码并使用BeautifulSoup HTML parser将\n替换为空字符串的示例：

from bs4 import BeautifulSoup

data = """<code class="inline">\n     object.__getattribute__\n    </code>\n    and\n    <code class="inline">\n     super.__getattribute__\n    </code>\n    peek\nin the\n    <code class="inline">\n     __dict__\n    </code>\n    of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n    <code class="inline">\n     __getdescriptor__\n    </code>\n    method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n    \n     super\n    </a>\n\n    \n    </a>\n    object.\n   </p>\n<p>\n    That is, the MRO walking loop in\n"""

soup = BeautifulSoup(data, "html.parser")
for code in soup("code"):
    code.string = code.string.replace("\n", "")

print(soup)

Answer 2

text = '<code class="inline">\n     object.__getattribute__\n    </code>\n    and\n    <code class="inline">\n     super.__getattribute__\n    </code>\n    peek\nin the\n    <code class="inline">\n     __dict__\n    </code>\n    of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n    <code class="inline">\n     __getdescriptor__\n    </code>\n    method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n    \n     super\n    </a>\n\n    \n    </a>\n    object.\n   </p>\n<p>\n    That is, the MRO walking loop in\n '

print(text.replace('\n',''))

删除特定代码块中的所有转义序列

2 个答案: