在熊猫中替换np.float64 nan

时间:2020-09-26 11:13:08

标签: python pandas numpy

我有一个熊猫数据框,如下所示:

>>> df.head()
            timestamp  count_200  count_201  count_503  count_504   mean_200    mean_201  mean_503  mean_504  count_500
0 2020-09-18 09:00:00     4932.0       51.0        NaN        NaN  59.501014   73.941176       0.0       0.0          0
1 2020-09-18 10:00:00     1697.0        9.0        NaN        NaN  57.807896   69.111111       0.0       0.0          0
2 2020-09-18 11:00:00     6895.0        6.0        2.0        1.0  54.037273   98.333333      33.0    1511.0          0
3 2020-09-18 12:00:00     2943.0       97.0        NaN        NaN  74.334353   74.268041       0.0       0.0          0
4 2020-09-18 13:00:00     2299.0       43.0        NaN        NaN  70.539800  102.302326       0.0       0.0          0

fillna不能代替NaN

>>> df.fillna(0)
              timestamp  count_200  count_201  count_503  count_504    mean_200    mean_201    mean_503  mean_504  count_500
0   2020-09-18 09:00:00     4932.0       51.0        NaN        NaN   59.501014   73.941176    0.000000     0.000          0
1   2020-09-18 10:00:00     1697.0        9.0        NaN        NaN   57.807896   69.111111    0.000000     0.000          0
2   2020-09-18 11:00:00     6895.0        6.0        2.0        1.0   54.037273   98.333333   33.000000  1511.000          0
3   2020-09-18 12:00:00     2943.0       97.0        NaN        NaN   74.334353   74.268041    0.000000     0.000          0
4   2020-09-18 13:00:00     2299.0       43.0        NaN        NaN   70.539800  102.302326    0.000000     0.000          0

但是,如果我们仅访问一行,则所得系列的fillna可以按预期工作:

>>> df.iloc[0]
timestamp    2020-09-18 09:00:00
count_200                   4932
count_201                     51
count_503                    NaN
count_504                    NaN
mean_200                  59.501
mean_201                 73.9412
mean_503                       0
mean_504                       0
count_500                      0
Name: 0, dtype: object

>>> df.iloc[0].fillna(0)
timestamp    2020-09-18 09:00:00
count_200                   4932
count_201                     51
count_503                      0
count_504                      0
mean_200                  59.501
mean_201                 73.9412
mean_503                       0
mean_504                       0
count_500                      0
Name: 0, dtype: object

这是怎么回事?

>>> df.iloc[0,3]
nan
>>> type(df.iloc[0,3])
<class 'numpy.float64'>

Pandas识别为na:

>>> df.isna()
     timestamp  count_200  count_201  count_503  count_504  mean_200  mean_201  mean_503  mean_504  count_500
0        False      False      False       True       True     False     False     False     False      False
1        False      False      False       True       True     False     False     False     False      False
2        False      False      False      False      False     False     False     False     False      False
3        False      False      False       True       True     False     False     False     False      False
4        False      False      False       True       True     False     False     False     False      False

但是使用numpys inbuild函数,可以在熊猫中修复它:

>>> df.head().apply(np.nan_to_num)
            timestamp  count_200  count_201  count_503  count_504   mean_200    mean_201  mean_503  mean_504  count_500
0 2020-09-18 09:00:00     4932.0       51.0        0.0        0.0  59.501014   73.941176       0.0       0.0          0
1 2020-09-18 10:00:00     1697.0        9.0        0.0        0.0  57.807896   69.111111       0.0       0.0          0
2 2020-09-18 11:00:00     6895.0        6.0        2.0        1.0  54.037273   98.333333      33.0    1511.0          0
3 2020-09-18 12:00:00     2943.0       97.0        0.0        0.0  74.334353   74.268041       0.0       0.0          0
4 2020-09-18 13:00:00     2299.0       43.0        0.0        0.0  70.539800  102.302326       0.0       0.0          0

这是预期的,我找不到此文档。我想念什么?这是错误吗?

3 个答案:

答案 0 :(得分:2)

df.head()

             timestamp  count_200  count_201  count_503  count_504   mean_200    mean_201  mean_503  mean_504  count_500
0 2020-09-18  09:00:00     4932.0       51.0        NaN        NaN  59.501014   73.941176       0.0       0.0          0
1 2020-09-18  10:00:00     1697.0        9.0        NaN        NaN  57.807896   69.111111       0.0       0.0          0
2 2020-09-18  11:00:00     6895.0        6.0        2.0        1.0  54.037273   98.333333      33.0    1511.0          0
3 2020-09-18  12:00:00     2943.0       97.0        NaN        NaN  74.334353   74.268041       0.0       0.0          0
4 2020-09-18  13:00:00     2299.0       43.0        NaN        NaN  70.539800  102.302326       0.0       0.0          0

NaN替换为0

df.fillna(0)

             timestamp  count_200  count_201  count_503  count_504   mean_200    mean_201  mean_503  mean_504  count_500
0 2020-09-18  09:00:00     4932.0       51.0        0.0        0.0  59.501014   73.941176       0.0       0.0          0
1 2020-09-18  10:00:00     1697.0        9.0        0.0        0.0  57.807896   69.111111       0.0       0.0          0
2 2020-09-18  11:00:00     6895.0        6.0        2.0        1.0  54.037273   98.333333      33.0    1511.0          0
3 2020-09-18  12:00:00     2943.0       97.0        0.0        0.0  74.334353   74.268041       0.0       0.0          0
4 2020-09-18  13:00:00     2299.0       43.0        0.0        0.0  70.539800  102.302326       0.0       0.0          0

对我来说很好。

使用inplace=True将更改应用于数据框

df.fillna(0, inplace=True)

我正在使用的熊猫版本是

print(pd.__version__)
0.23.0

请重新启动IDE / python内核

检查并更新熊猫版本(如果需要)

答案 1 :(得分:1)

df[df.isna().any()] = 0

您可以使用它,pandas lib可能会使您感到困惑,因为对于一种功能,您可以执行许多类型的事情,我通常会尽一切努力,不要卡在其中,告诉我这是否起作用或至少在做什么

答案 2 :(得分:0)

我似乎无法重新创建该错误,如果我复制您提供的df并使用pd.read_clipboard()将其转换为df,则df.fillna(0)会为我提供预期的结果。

当您提供df.fillna(0)的收益时,这是实际收益吗?或者您正在打印df。如果是这样,请记住使用inplace=True参数。