如果满足条件,则在数据框中填充一列

时间:2018-12-18 15:28:12

标签: python pandas

我有以下数据框:

shared_from_this

我要寻找的是,如果付款是在开始日期的n年内付款的,则应在withNYears列中显示“已付款金额”,否则会显示NaN。 N年可以是任何数字,但对于本示例,我们可以说2年(因为我将使用它来查看发现)。

因此,基本上,如果在2年内付款,上述数据框就会像这样出现:

PersonID  AmountPaid  PaymentReceivedDate  StartDate withinNYears  
1         100         2017                   2016   
2         20          2014                   2014
1         30          2017                   2016
1         40          2016                   2016
4         300         2015                   2000
5         150         2005                   2002  

有人知道如何实现吗?欢呼。

2 个答案:

答案 0 :(得分:3)

减去列并按标量比较布尔掩码,然后通过numpy.whereSeries.whereDataFrame.loc设置值:

m = (df['PaymentReceivedDate'] - df['StartDate']) < 2
df['withinNYears'] = np.where(m, df['AmountPaid'], np.nan)
#alternatives
#df['withinNYears'] = df['AmountPaid'].where(m)
#df.loc[m, 'withinNYears'] = df['AmountPaid']

print (df)
   PersonID  AmountPaid  PaymentReceivedDate  StartDate   \
0         1         100                 2017                    2016   
1         2          20                 2014                    2014   
2         1          30                 2017                    2016   
3         1          40                 2016                    2016   
4         4         300                 2015                    2000   
5         5         150                 2005                    2002   

   withinNYears  
0         100.0  
1          20.0  
2          30.0  
3          40.0  
4           NaN  
5           NaN

编辑:

如果StartDate列中有日期时间:

m = (df['PaymentReceivedDate'] - df['StartDate'].dt. year) < 2

答案 1 :(得分:3)

只需使用loc

进行分配
df.loc[(df['PaymentReceivedDate'] - df['StartDate']<2),'withinNYears']=df.AmountPaid
df
Out[37]: 
   PersonID  AmountPaid      ...       StartDate  withinNYears
0         1         100      ...            2016         100.0
1         2          20      ...            2014          20.0
2         1          30      ...            2016          30.0
3         1          40      ...            2016          40.0
4         4         300      ...            2000           NaN
5         5         150      ...            2002           NaN
[6 rows x 5 columns]