从也包含对象的numpy数组中删除NaN行

时间:2019-07-24 15:23:42

标签: python numpy

给定一个numpy数组,我想确定哪些行包含NaN值和对象。 例如,一行将包含浮点值和列表。

对于输入数组arr,我尝试做arr[~np.isnan(arr).any(axis=1)],但随后收到错误消息

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could 
not be safely coerced to any supported types according to the casting rule ''safe''

1 个答案:

答案 0 :(得分:1)

In [314]: x = np.array([[1, [2,3], np.nan], [3, [5,6,7], 8]])                                                
In [315]: x                                                                                                  
Out[315]: 
array([[1, list([2, 3]), nan],
       [3, list([5, 6, 7]), 8]], dtype=object)
In [316]: x.shape                                                                                            
Out[316]: (2, 3)
In [317]: x[0]                                                                                               
Out[317]: array([1, list([2, 3]), nan], dtype=object)
In [318]: x[1]                                                                                               
Out[318]: array([3, list([5, 6, 7]), 8], dtype=object)

isnan适用于float dtype数组; dtype对象无法转换为该类型:

In [320]: np.isnan(x)                                                                                        
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-320-3b2be83a8ed7> in <module>
----> 1 np.isnan(x)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

不过,我们可以使用is np.nan测试来逐个测试元素:

In [325]: np.frompyfunc(lambda i: i is np.nan,1,1)(x)                                                        
Out[325]: 
array([[False, False, True],
       [False, False, False]], dtype=object)

frompyfunc返回对象dtype;让我们将其转换为bool:

In [328]: np.frompyfunc(lambda i: i is np.nan,1,1)(x).astype(bool)                                           
Out[328]: 
array([[False, False,  True],
       [False, False, False]])
In [329]: np.any(_, axis=1)           # test whole rows                                                                       
Out[329]: array([ True, False])
In [330]: x[~_, :]                    # use that as mask to keep other rows                                                      
Out[330]: array([[3, list([5, 6, 7]), 8]], dtype=object)

在另一个答案中建议的熊猫isnull可以通过逐个元素测试来做类似的事情:

In [335]: pd.isnull(x)                                                                                       
Out[335]: 
array([[False, False,  True],
       [False, False, False]])
相关问题