根据其他列的条件在新列中添加值

时间:2021-05-02 16:49:32

标签: python pandas dataframe numpy

我有以下数据框(我在正确显示表格时遇到了一些问题,请参阅字典的最后一部分):

 account_id contract_id date_activated  term_months 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00 2021-04-01 00:00:00 2021-05-01 00:00:00 2021-06-01 00:00:00 2021-07-01 00:00:00 2021-08-01 00:00:00 2021-09-01 00:00:00 2021-10-01 00:00:00 2021-11-01 00:00:00 2021-12-01 00:00:00
0   1   A   2021-01-01  1   200.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   1   B   2021-02-13  12  0.0 300.0   300.0   300.0   300.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
2   1   C   2021-04-06  12  0.0 0.0 0.0 400.0   400.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   1   I   2020-10-23  6   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 150.0   150.0   0.0
4   1   N   2021-11-11  6   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0   100.0
5   2   K   2021-01-01  12  100.0   100.0   100.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6   2   F   2021-03-23  6   0.0 0.0 50.0    50.0    50.0    50.0    50.0    50.0    0.0 0.0 0.0 0.0

我想要如图所示的结果(带有新列 contract_type 和renewal_type):

 account_id contract_id date_activated  term_months contract_type   renewal_type    2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00 2021-04-01 00:00:00 2021-05-01 00:00:00 2021-06-01 00:00:00 2021-07-01 00:00:00 2021-08-01 00:00:00 2021-09-01 00:00:00 2021-10-01 00:00:00 2021-11-01 00:00:00 2021-12-01 00:00:00
0   1   A   2021-01-01  1   Original    Regular 200.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   1   B   2021-02-13  12  Upgrade Regular 0.0 300.0   300.0   300.0   300.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
2   1   C   2021-04-06  12  Upgrade Early   0.0 0.0 0.0 400.0   400.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   1   I   2020-10-23  6   Winback Regular 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 150.0   150.0   0.0
4   1   N   2021-11-11  6   Renewal Early   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0   100.0
5   2   K   2021-01-01  12  Original    Regular 100.0   100.0   100.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6   2   F   2021-03-23  6   Renewal Early   0.0 0.0 50.0    50.0    50.0    50.0    50.0    50.0    0.0 0.0 0.0 0.0

您可以在此链接上下载 Excel 文件以获取结果的样本副本:https://drive.google.com/file/d/16BLoSugMaDdB8Qac2ATJRBLRvx3HCIus/view?usp=sharing

每个帐户都有多个合约。我想根据每月的交易量添加两列(从第五列以此类推)。

续订类型应为“常规”或“提前”。当它是“原始”或“赢回”合同时,它是“常规”。如果之前的合同已于上个月结束并且新合同的付款是在下个月,则它也是“常规”的。当为同一个帐户签订新合同但前一个合同尚未到期或结束其期限(基于 term_months)时,这是“早”。

合同类型应为“原始”、“续订”、“升级”或“赢回”。如果它是该帐户的第一个合同,则为“原始”。如果在前一份合同没有付款/交易 4 个月后签订新合同,则为“赢回”。如果它不是“Winback”并且新合同的付款比以前的合同多,则它是“升级”。如果它不属于“原始”、“升级”或“赢回”,则视为“续订”。

尝试使用此代码执行此操作,但存在一些问题,因为它将某些“原始”归类为“Winback”(对于 contract_type)而将某些“早期”归类为“常规”(对于renewal_type):

def get_types(monthly_payments):
    def f(s):
        check = monthly_payments.loc[
            (s.date_activated.year == monthly_payments.index.year) &
            (s.date_activated.month == monthly_payments.index.month)
            ].iloc[0]

        if check.wb == 0:
            # If rolling sum of 4 months prior is 0
            s['contract_type'] = 'Winback'
        elif check.og_upg == 0:
            # If Prior Month is 0
            s['contract_type'] = 'Original'

        elif check.max_pmt > check.og_upg:
            # If Prior Month is not missing and current month is more
            s['contract_type'] = 'Upgrade'
        else:
            s['contract_type'] = 'Renewal'

        if check.early:
            # If Early
            s['renewal_type'] = 'Early'
        else:
            s['renewal_type'] = 'Regular'
        return s

    return f

def apply_types(g):
    # Get Non Payment Info
    account_info = g[g.columns[:4]]
    # Transpose Monthly Payments To Rows
    monthly_payments = g.loc[:, g.columns[4:]].T
    # Make Sure Index is DT
    monthly_payments.index = pd.to_datetime(monthly_payments.index)
    # Get Check for is early based on number of payments
    monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
    # Max Payment In Month
    monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
    # 1 Month Prior
    monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
    # Rolling Sum of 4 Months Prior
    monthly_payments['wb'] = monthly_payments.max_pmt \
        .rolling(min_periods=0, window=4).sum().shift()
    # Concat New Columns With Original Payment Information
    return pd.concat((
        account_info.apply(get_types(monthly_payments), axis=1),
        g[g.columns[4:]]
    ), axis=1)

df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)

这是数据框的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
 'contract_id': {0: 'A', 1: 'B', 2: 'C', 3: 'I', 4: 'N', 5: 'K', 6: 'F'},
 'date_activated': {0: Timestamp('2021-01-01 00:00:00'),
  1: Timestamp('2021-02-13 00:00:00'),
  2: Timestamp('2021-04-06 00:00:00'),
  3: Timestamp('2020-10-23 00:00:00'),
  4: Timestamp('2021-11-11 00:00:00'),
  5: Timestamp('2021-01-01 00:00:00'),
  6: Timestamp('2021-03-23 00:00:00')},
 'term_months': {0: 1, 1: 12, 2: 12, 3: 6, 4: 6, 5: 12, 6: 6},
 datetime.datetime(2021, 1, 1, 0, 0): {0: 200.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 100.0,
  6: 0.0},
 datetime.datetime(2021, 2, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 100.0,
  6: 0.0},
 datetime.datetime(2021, 3, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 100.0,
  6: 50.0},
 datetime.datetime(2021, 4, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 400.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 5, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 400.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 6, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 7, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 8, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 9, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0},
 datetime.datetime(2021, 10, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 150.0,
  4: 0.0,
  5: 0.0,
  6: 0.0},
 datetime.datetime(2021, 11, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 150.0,
  4: 100.0,
  5: 0.0,
  6: 0.0},
 datetime.datetime(2021, 12, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 100.0,
  5: 0.0,
  6: 0.0}}

这是结果的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
 'contract_id': {0: 'A', 1: 'B', 2: 'C', 3: 'I', 4: 'N', 5: 'K', 6: 'F'},
 'date_activated': {0: Timestamp('2021-01-01 00:00:00'),
  1: Timestamp('2021-02-13 00:00:00'),
  2: Timestamp('2021-04-06 00:00:00'),
  3: Timestamp('2020-10-23 00:00:00'),
  4: Timestamp('2021-11-11 00:00:00'),
  5: Timestamp('2021-01-01 00:00:00'),
  6: Timestamp('2021-03-23 00:00:00')},
 'term_months': {0: 1, 1: 12, 2: 12, 3: 6, 4: 6, 5: 12, 6: 6},
 'contract_type': {0: 'Original',
  1: 'Upgrade',
  2: 'Upgrade',
  3: 'Winback',
  4: 'Renewal',
  5: 'Original',
  6: 'Renewal'},
 'renewal_type': {0: 'Regular',
  1: 'Regular',
  2: 'Early',
  3: 'Regular',
  4: 'Early',
  5: 'Regular',
  6: 'Early'},
 datetime.datetime(2021, 1, 1, 0, 0): {0: 200.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 100.0,
  6: 0.0},
 datetime.datetime(2021, 2, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 100.0,
  6: 0.0},
 datetime.datetime(2021, 3, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 100.0,
  6: 50.0},
 datetime.datetime(2021, 4, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 400.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 5, 1, 0, 0): {0: 0.0,
  1: 300.0,
  2: 400.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 6, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 7, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 8, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 50.0},
 datetime.datetime(2021, 9, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0},
 datetime.datetime(2021, 10, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 150.0,
  4: 0.0,
  5: 0.0,
  6: 0.0},
 datetime.datetime(2021, 11, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 150.0,
  4: 100.0,
  5: 0.0,
  6: 0.0},
 datetime.datetime(2021, 12, 1, 0, 0): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 100.0,
  5: 0.0,
  6: 0.0}}

0 个答案:

没有答案