Question

在pandas（master分支或即将推出的0.14）中，如何查找数据帧为空的索引？

当我这样做时：

df.isnull()

我得到一个与df

大小相同的布尔数据帧

如果我这样做：

df.isnull().index

我得到了原始df的索引。

我想要的是具有NaN条目的行的索引（在某些列上，或在所有列上）

Answer 1

df.index[df.isnull().any(axis=1)]

无论是否存在至少一个NaN值，.any(axis=1)都会为每行提供True / False。有了这个，你可以对索引进行布尔索引，找到df为null的索引。

Answer 2

我会努力做到这一点：

In [11]: df = pd.DataFrame([[np.nan, 1], [0, np.nan], [1, 2]])

In [12]: df
Out[12]:
    0   1
0 NaN   1
1   0 NaN
2   1   2

In [13]: pd.isnull(df.values)
Out[13]:
array([[ True, False],
       [False,  True],
       [False, False]], dtype=bool)

In [14]: pd.isnull(df.values).any(1)
Out[14]: array([ True,  True, False], dtype=bool)

In [15]: np.nonzero(pd.isnull(df.values).any(1))
Out[15]: (array([0, 1]),)

In [16]: df.index[np.nonzero(pd.isnull(df.values).any(1))]
Out[16]: Int64Index([0, 1], dtype='int64')

要查看一些时间，df略大一些：

In [21]: df = pd.DataFrame([[np.nan, 1], [0, np.nan], [1, 2]] * 1000)

In [22]: %timeit np.nonzero(pd.isnull(df.values).any(1))
10000 loops, best of 3: 85.8 µs per loop

In [23]: %timeit df.index[df.isnull().any(1)]
1000 loops, best of 3: 629 µs per loop

如果你关心指数（而不是位置）：

In [24]: %timeit df.index[np.nonzero(pd.isnull(df.values).any(1))]
10000 loops, best of 3: 172 µs per loop

查找df为null的索引

2 个答案: