python正则表达式

时间:2011-05-10 10:15:38

标签: python

我是python的新手。我有一个单词数组,每个单词都必须检查,看它是否包含任何特殊字符或数字。如果包含,那么我必须跳过这个词。我该怎么做?

3 个答案:

答案 0 :(得分:4)

它必须是正则表达式吗?如果没有,您可以使用isalpha()字符串方法。

答案 1 :(得分:2)

我对这个问题的解读是你要丢弃任何包含非字母字符的单词。请尝试以下方法:

>>> array = ['hello', 'hello2', '?hello', '?hello2']
>>> filtered = filter(str.isalpha, array)
>>> print filtered
['hello']

你也可以把它写成列表理解:

>>> filtered = [word for word in array if word.isalpha()]
>>> print filtered
['hello']

答案 2 :(得分:1)

如果您只想排除几个字符,请使用黑名单,否则请使用白名单。

import string
abadword="""aaaa
bbbbb"""
words=["oneGoodWord", "a,bc",abadword, "xx\n",'123',"gone", "tab    tab", "theEnd.","anotherGoodWord"]

bad=list(string.punctuation) #string.punctuation='!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
bad+=['\n','\t','1'] #add some more characters you don't want
bad+=['one'] #this is redundant as in function skip set(word) becomes a set of word's characters. 'one' cannot match a character.

print bad #bad = ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~', '\n', '\t', '1', 'one']

bad=set(bad)

def skip(word):
    return len(set(word) & bad)==0 #word has no characters in common with bad word

print "good words:"
print filter(skip,words) #prints ['oneGoodWord', 'gone', 'anotherGoodWord']