大熊猫填补了缺失数据的空白

时间:2015-12-08 14:22:08

标签: python pandas

我有两个pandas DataFrames

>>> import pandas as pd
>>> import numpy as np

>>> df1 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [np.nan, np.nan, 3, 4]},
               index=[['A',  'A', 'B', 'B'], [1, 2, 1, 2]])

>>> df1
     a   b
A 1  1 NaN
  2  2 NaN
B 1  3   3
  2  4   4

>>> df2 = pd.DataFrame({'b': [1, 2]}, index=[['A','A'], [1, 2]])
>>> df2

     b
A 1  1
  2  2

其中df2包含df1的缺失数据。如何合并两个DataFrame来获取

     a   b
A 1  1   1
  2  2   2
B 1  3   3
  2  4   4

?我尝试了pd.concat([df1,df2], axis=1),结果是

     a   b   b
A 1  1 NaN   1
  2  2 NaN   2
B 1  3   3 NaN
  2  4   4 NaN

在我的情况下,保证我没有重叠值。

2 个答案:

答案 0 :(得分:3)

您可以尝试combine_firstfillna

print df1.combine_first(df2)
     a  b
A 1  1  1
  2  2  2
B 1  3  3
  2  4  4

print df1.fillna(df2) 
     a  b
A 1  1  1
  2  2  2
B 1  3  3
  2  4  4

定时:

In [5]: %timeit df1.combine_first(df2)
The slowest run took 6.01 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 2.15 ms per loop

In [6]: %timeit df1.fillna(df2)
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 2.76 ms per loop

答案 1 :(得分:2)

您还可以使用update

In [36]: df1.update(df2)

In [37]: df1
Out[37]:
     a  b
A 1  1  1
  2  2  2
B 1  3  3
  2  4  4