从嵌套列表中删除标记

时间:2016-02-27 09:05:29

标签: python

我有一个这种形式的嵌套列表:

[[u' (SBAR - TMP (WHADVP-1 (WRB When)) (S (NP-SBJ (PRP it)))']
[u'(NP-SBJ (DT the) (NNS traders))']
[u'(NP (NNS orders) (S (-NONE- *ICH*-2)))']
[u'(PP-MNR (IN via) (NP (NNS computers)))']
[u'(S-2\n  (NP-SBJ (-NONE- *))\n  (VP\n    (TO to)]]

我想删除标签和此输出:

((when it)(the traders)(orders)(via computers))

谁能告诉我如何在python中做到这一点?

1 个答案:

答案 0 :(得分:0)

你可以得到一切不是大写的东西。我不知道你认为标签是什么,所以你可以从这些方面开始:

import re

arr = [[u' (SBAR - TMP (WHADVP-1 (WRB When)) (S (NP-SBJ (PRP it)))'],
    [u'(NP-SBJ (DT the) (NNS traders))'],
    [u'(NP (NNS orders) (S (-NONE- *ICH*-2)))'],
    [u'(PP-MNR (IN via) (NP (NNS computers)))'],
    [u'(S-2\n  (NP-SBJ (-NONE- *))\n  (VP\n    (TO to)']]

res = [' '.join(re.findall(r'(\b[A-Za-z][a-z ]+\b)', s[0])) for s in arr]

print(res)
# [u'When it', u'the traders', u'orders', u'via computers', u'to']