Question

早上好

给定包含文本数据的数据框，例如：

df = pandas.DataFrame({
    'a':['first', 'second', 'third'], 
    'b':['null', 'third', 'first']})

我可以通过以下方式选择包含单词'first'的行：

df.a.str.contains('first') | df.b.str.contains('first')

会产生

0     True
1    False
2     True
dtype: bool

要将相同的条件应用于数十个列，我可以使用isin，但如果我需要用'first'代替regex = '(?=.*first)(?=.*second)'，那么它似乎无效。

是否有更多的pythonic和优雅方式在多个列上进行选择，而不是在代码中将几个单列df.<column_name>.str.contains(regex)条件与|连接起来？感谢

Answer 1

为什么我们不在整个数据框架上使用applymap。这与使用列不同，但会使您更容易将if-else条件应用于（我希望）：

In [62]: l = ['first', 'second']

In [63]: df
Out[63]: 
        a      b
0   first   null
1  second  third
2   third  first

In [64]: df.appl
df.apply     df.applymap  

In [64]: df.applymap(lambda v: True if v in l else False)
Out[64]: 
       a      b
0   True  False
1   True  False
2  False   True

更新

（感谢@Pythonic的更新）

我们可以像applymap那样提供正则表达式：

regex = '(^fi)'
df.applymap(lambda v: bool(re.search(regex, v)))
## -- End pasted text --
Out[38]: 
       a      b
0   True  False
1  False  False
2  False   True

以下示例启用了re.flags：

In [44]: df = pandas.DataFrame({
   ....:     'a':['First', 'second', 'NULL'], 
   ....:     'b':['null', 'third', 'first']})

In [45]: regex = re.compile('(^fi)', flags=re.IGNORECASE)

In [46]: df.applymap(lambda v: bool(re.search(regex_ignore_case, v)))
Out[46]: 
       a      b
0   True  False
1  False  False
2  False   True

Pandas - 在任何列中选择包含特定正则表达式的数据帧的行

1 个答案:

更新