在Python中从字符串中删除Wiki标记

时间:2012-06-16 04:40:51

标签: python

我有一个字符串,其中包含从Wikia页面下载的信息。

为了解析其内容,我如何从页面中删除所有Wiki格式,只留下原始文本?

这是一个可能出现的例子:

#REDIRECT[[Blah]]

{{
I have some stuff in here
}}
[[I also have some stuff in here|and here]]
[[http://blehthisisfake.com Link to a fake website]]

<span class="plainlinks">This is quite useless. Why was [[this page]] even created?</span>

<nowiki>There are more HTML tags, they should probably all be stripped...</nowiki>

There is random text in here. bleh bleh bleh

I'm not sure what single [brackets] do, but they should be stripped too...

预期产出:

There is random text in here. bleh bleh bleh

I'm not sure what single do, but they should be stripped too...

是否有可以执行此操作的模块?

1 个答案:

答案 0 :(得分:2)

Google搜索“python wiki解析器”会出现this code,它会删除并替换标记(有关详细信息,请参阅链接中的源代码)。