根据条件从前一行返回值

时间:2021-05-02 01:44:55

标签: python pandas function dataframe

我有以下数据框(如果您对表格有疑问,请参阅下面的字典):

    account_id  contract_id year_activation month_activated date_activated  state   term_months contract_type   renewal_type
0   1   A   2020    December    2020-12-01  CA  6   Original    Regular
1   1   B   2021    February    2021-02-13  CA  12  Upgrade Early
2   1   C   2021    February    2021-04-06  CA  12  Upgrade Early
3   1   I   2021    October 2021-10-23  NY  6   Winback Regular
4   1   N   2021    November    2021-11-11  NY  6   Renewal Early
5   2   K   2021    January 2021-01-01  IL  12  Original    Regular
6   2   F   2021    March   2021-03-23  IL  6   Renewal Early

我希望输出如下:

 account_id contract_id orig_year_activation    orig_month_activated    orig_date_activated orig_state  year_activation month_activated date_activated  state   term_months contract_type   renewal_type
0   1   A   2020    December    2020-12-01  CA  2020    December    2020-12-01  CA  6   Original    Regular
1   1   B   2020    December    2020-12-01  CA  2021    February    2021-02-13  CA  12  Upgrade Early
2   1   C   2020    December    2020-12-01  CA  2021    February    2021-04-06  CA  12  Upgrade Early
3   1   I   2021    October 2021-10-23  NY  2021    October 2021-10-23  NY  6   Winback Regular
4   1   N   2021    October 2021-10-23  NY  2021    November    2021-11-11  NY  6   Renewal Early
5   2   K   2021    January 2021-01-01  IL  2021    January 2021-01-01  IL  12  Original    Regular
6   2   F   2021    January 2021-01-01  IL  2021    March   2021-03-23  IL  6   Renewal Early

一个帐户可以有多个交易。如果较早的合约的 contract_type 为“Original”,则后续合约将复制该合约的详细信息(year_activation、month_activated、date_activated 和 state)。如果有较早的合约类型为“Winback”的交易,后续合约将在“orig_year_activation”、“orig_month_activated”、“orig_date_activated”和“orig_state”列下返回该合约的数据。

这是数据框的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
 'contract_id': {0: 'A', 1: 'B', 2: 'C', 3: 'I', 4: 'N', 5: 'K', 6: 'F'},
 'year_activation': {0: 2020,
  1: 2021,
  2: 2021,
  3: 2021,
  4: 2021,
  5: 2021,
  6: 2021},
 'month_activated': {0: 'December',
  1: 'February',
  2: 'February',
  3: 'October',
  4: 'November',
  5: 'January',
  6: 'March'},
 'date_activated': {0: Timestamp('2020-12-01 00:00:00'),
  1: Timestamp('2021-02-13 00:00:00'),
  2: Timestamp('2021-04-06 00:00:00'),
  3: Timestamp('2021-10-23 00:00:00'),
  4: Timestamp('2021-11-11 00:00:00'),
  5: Timestamp('2021-01-01 00:00:00'),
  6: Timestamp('2021-03-23 00:00:00')},
 'state': {0: 'CA', 1: 'CA', 2: 'CA', 3: 'NY', 4: 'NY', 5: 'IL', 6: 'IL'},
 'term_months': {0: 6, 1: 12, 2: 12, 3: 6, 4: 6, 5: 12, 6: 6},
 'contract_type': {0: 'Original',
  1: 'Upgrade',
  2: 'Upgrade',
  3: 'Winback',
  4: 'Renewal',
  5: 'Original',
  6: 'Renewal'},
 'renewal_type': {0: 'Regular',
  1: 'Early',
  2: 'Early',
  3: 'Regular',
  4: 'Early',
  5: 'Regular',
  6: 'Early'}}

这是结果的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
 'contract_id': {0: 'A', 1: 'B', 2: 'C', 3: 'I', 4: 'N', 5: 'K', 6: 'F'},
 'orig_year_activation': {0: 2020,
  1: 2020,
  2: 2020,
  3: 2021,
  4: 2021,
  5: 2021,
  6: 2021},
 'orig_month_activated': {0: 'December',
  1: 'December',
  2: 'December',
  3: 'October',
  4: 'October',
  5: 'January',
  6: 'January'},
 'orig_date_activated': {0: Timestamp('2020-12-01 00:00:00'),
  1: Timestamp('2020-12-01 00:00:00'),
  2: Timestamp('2020-12-01 00:00:00'),
  3: Timestamp('2021-10-23 00:00:00'),
  4: Timestamp('2021-10-23 00:00:00'),
  5: Timestamp('2021-01-01 00:00:00'),
  6: Timestamp('2021-01-01 00:00:00')},
 'orig_state': {0: 'CA', 1: 'CA', 2: 'CA', 3: 'NY', 4: 'NY', 5: 'IL', 6: 'IL'},
 'year_activation': {0: 2020,
  1: 2021,
  2: 2021,
  3: 2021,
  4: 2021,
  5: 2021,
  6: 2021},
 'month_activated': {0: 'December',
  1: 'February',
  2: 'February',
  3: 'October',
  4: 'November',
  5: 'January',
  6: 'March'},
 'date_activated': {0: Timestamp('2020-12-01 00:00:00'),
  1: Timestamp('2021-02-13 00:00:00'),
  2: Timestamp('2021-04-06 00:00:00'),
  3: Timestamp('2021-10-23 00:00:00'),
  4: Timestamp('2021-11-11 00:00:00'),
  5: Timestamp('2021-01-01 00:00:00'),
  6: Timestamp('2021-03-23 00:00:00')},
 'state': {0: 'CA', 1: 'CA', 2: 'CA', 3: 'NY', 4: 'NY', 5: 'IL', 6: 'IL'},
 'term_months': {0: 6, 1: 12, 2: 12, 3: 6, 4: 6, 5: 12, 6: 6},
 'contract_type': {0: 'Original',
  1: 'Upgrade',
  2: 'Upgrade',
  3: 'Winback',
  4: 'Renewal',
  5: 'Original',
  6: 'Renewal'},
 'renewal_type': {0: 'Regular',
  1: 'Early',
  2: 'Early',
  3: 'Regular',
  4: 'Early',
  5: 'Regular',
  6: 'Early'}}

1 个答案:

答案 0 :(得分:2)

import pandas as pd

# Full DataFrame Code Below
df = pd.DataFrame({...})

# Where contract type is Original or Winback set equal to value
mask = (df['contract_type'].eq('Original') | df['contract_type'].eq('Winback'))

# Add Columns At Row Based on Mask and assign values based on existing columns
df.loc[mask, ['orig_year_activation',
              'orig_month_activated',
              'orig_date_activated',
              'orig_state']] = df.loc[mask, ['year_activation',
                                             'month_activated',
                                             'date_activated',
                                             'state']].values
# Ensure Correct Types
df = df.convert_dtypes()
# Fill The NaNs
df = df.ffill()
# Re Order Columns
df = df[['account_id', 'contract_id',
         'orig_year_activation', 'orig_month_activated',
         'orig_date_activated', 'orig_state', 'year_activation',
         'month_activated', 'date_activated', 'state',
         'term_months', 'contract_type', 'renewal_type']]

# Output
print(df.to_string())

输出:

   account_id contract_id  orig_year_activation orig_month_activated orig_date_activated orig_state  year_activation month_activated date_activated state  term_months contract_type renewal_type
0           1           A                  2020             December          2020-12-01         CA             2020        December     2020-12-01    CA            6      Original      Regular
1           1           B                  2020             December          2020-12-01         CA             2021        February     2021-02-13    CA           12       Upgrade        Early
2           1           C                  2020             December          2020-12-01         CA             2021        February     2021-04-06    CA           12       Upgrade        Early
3           1           I                  2021              October          2021-10-23         NY             2021         October     2021-10-23    NY            6       Winback      Regular
4           1           N                  2021              October          2021-10-23         NY             2021        November     2021-11-11    NY            6       Renewal        Early
5           2           K                  2021              January          2021-01-01         IL             2021         January     2021-01-01    IL           12      Original      Regular
6           2           F                  2021              January          2021-01-01         IL             2021           March     2021-03-23    IL            6       Renewal        Early

完整的DataFrame构建代码:

df = pd.DataFrame({'account_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
                   'contract_id': {0: 'A', 1: 'B', 2: 'C', 3: 'I', 4: 'N',
                                   5: 'K', 6: 'F'},
                   'year_activation': {0: 2020, 1: 2021, 2: 2021, 3: 2021,
                                       4: 2021, 5: 2021, 6: 2021},
                   'month_activated': {0: 'December', 1: 'February',
                                       2: 'February', 3: 'October',
                                       4: 'November', 5: 'January',
                                       6: 'March'},
                   'date_activated': {0: pd.to_datetime('2020-12-01 00:00:00'),
                                      1: pd.to_datetime('2021-02-13 00:00:00'),
                                      2: pd.to_datetime('2021-04-06 00:00:00'),
                                      3: pd.to_datetime('2021-10-23 00:00:00'),
                                      4: pd.to_datetime('2021-11-11 00:00:00'),
                                      5: pd.to_datetime('2021-01-01 00:00:00'),
                                      6: pd.to_datetime('2021-03-23 00:00:00')},
                   'state': {0: 'CA', 1: 'CA', 2: 'CA', 3: 'NY',
                             4: 'NY', 5: 'IL', 6: 'IL'},
                   'term_months': {0: 6, 1: 12, 2: 12, 3: 6,
                                   4: 6, 5: 12, 6: 6},
                   'contract_type': {0: 'Original', 1: 'Upgrade',
                                     2: 'Upgrade', 3: 'Winback',
                                     4: 'Renewal', 5: 'Original',
                                     6: 'Renewal'},
                   'renewal_type': {0: 'Regular', 1: 'Early',
                                    2: 'Early', 3: 'Regular',
                                    4: 'Early', 5: 'Regular',
                                    6: 'Early'}})
相关问题