删除特定代码块中的所有转义序列

时间:2016-12-11 21:59:42

标签: python regex

我有一个HTML代码段,如下所示:

<code class="inline">\n     object.__getattribute__\n    </code>\n    and\n    <code class="inline">\n     super.__getattribute__\n    </code>\n    peek\nin the\n    <code class="inline">\n     __dict__\n    </code>\n    of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n    <code class="inline">\n     __getdescriptor__\n    </code>\n    method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n    \n     super\n    </a>\n\n    \n    </a>\n    object.\n   </p>\n<p>\n    That is, the MRO walking loop in\n  

问题

如何仅定位\n代码中的<code>

我尝试了什么

我尝试使用re.sub()方法,但我一直在替换所有内容而不仅仅是\n代码

2 个答案:

答案 0 :(得分:2)

由于输入是HTML,为什么不使用专门的工具 - HTML解析器

以下是有关如何找到所有code代码并使用BeautifulSoup HTML parser\n替换为空字符串的示例:

from bs4 import BeautifulSoup

data = """<code class="inline">\n     object.__getattribute__\n    </code>\n    and\n    <code class="inline">\n     super.__getattribute__\n    </code>\n    peek\nin the\n    <code class="inline">\n     __dict__\n    </code>\n    of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n    <code class="inline">\n     __getdescriptor__\n    </code>\n    method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n    \n     super\n    </a>\n\n    \n    </a>\n    object.\n   </p>\n<p>\n    That is, the MRO walking loop in\n"""

soup = BeautifulSoup(data, "html.parser")
for code in soup("code"):
    code.string = code.string.replace("\n", "")

print(soup)

答案 1 :(得分:1)

text = '<code class="inline">\n     object.__getattribute__\n    </code>\n    and\n    <code class="inline">\n     super.__getattribute__\n    </code>\n    peek\nin the\n    <code class="inline">\n     __dict__\n    </code>\n    of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n    <code class="inline">\n     __getdescriptor__\n    </code>\n    method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n    \n     super\n    </a>\n\n    \n    </a>\n    object.\n   </p>\n<p>\n    That is, the MRO walking loop in\n '

print(text.replace('\n',''))