Question

我有下表，我想删除所有在其col1值中确实有“ C”的行。

     col1  col2
0       1     3
1       2     4
2    C345     3
3  A56665     4
4   34553     3
5  353535     4

下面的代码似乎只考虑col1值为str的行。为什么会这样？

import pandas as pd

d = {'col1': [1, 2, "C345", "A56665", 34553, 353535], 'col2': [3, 4,3, 4,3, 4]}
df = pd.DataFrame(data=d)
df.col1.astype(str)
print(df.dtypes)

print(df.loc[df.col1.str.contains("C") == False])

结果

     col1  col2
3  A56665     4

所需结果：

     col1  col2
0       1     3
1       2     4
3  A56665     4
4   34553     3
5  353535     4

我使用Python 3.6和pandas 0.23.4，numpy 1.15.4

Answer 1

如果检查str.contains包含缺少数字值：

print(df.col1.str.contains("C"))
0      NaN
1      NaN
2     True
3    False
4      NaN
5      NaN
Name: col1, dtype: object

解决方案是使用参数na到str.contains并通过~反转布尔掩码：

print(df[~df.col1.str.contains("C", na=False)])
     col1  col2
0       1     3
1       2     4
3  A56665     4
4   34553     3
5  353535     4

详细信息：

print(df.col1.str.contains("C", na=False))
0    False
1    False
2     True
3    False
4    False
5    False
Name: col1, dtype: bool

print(~df.col1.str.contains("C", na=False))
0     True
1     True
2    False
3     True
4     True
5     True
Name: col1, dtype: bool

Answer 2

先转换为字符串，然后照常进行：

df.loc[df.col1.astype(str).str.contains(r"C") == False]

通过混合数据类型过滤DataFrame

2 个答案: