Question

我一直在使用传统的字典映射方法创建一个新的 Pandas 列，如下所示：

product_costs = {x: 1, y: 2, z: 3}

df['costs'] = df['product'].map(product_costs)

这一直有效，但最近产品“x”的成本发生了变化 - 例如，从 4 月 1 日起，成本从 1 增加到 4。

我的“df”也有一个日期列，我想弄清楚如何映射 1 的值，其中日期列是 4 月之前，而 4 的值是日期列是 4 月之后。

我可能可以用 for 循环反复执行此操作，即：

df['costs'] = ''

index = 0

for i in df['product']:

    if i == 'x' and df.loc[index, 'date'] < 2021-04-01:

        df.loc[index, 'costs'] = 1
        index += 1

    elif i == 'x' and df.loc[index, 'date'] >= 2021-04-01:

        df.loc[index, 'costs'] = 4
        index += 1

    elif i == 'y':

    etc. etc.

...但是，当我确信可以以更简单的方式实现相同的结果时，这似乎非常冗长乏味。任何人都可以就如何将“地点日期”元素包含到我的映射中提出解决方案吗？

编辑 - 下面的示例数据

date (dd-mm)        product

01-02                  x

01-02                  y

01-02                  z

01-03                  x

01-03                  y

01-03                  z

01-04                  x

01-04                  y

01-04                  z

成为...

date (dd-mm)        product        cost

01-02                  x            1

01-02                  y            2

01-02                  z            3

01-03                  x            1

01-03                  y            2

01-03                  z            3

01-04                  x            4

01-04                  y            2

01-04                  z            3

Answer 1

`np.where()`

您可以根据日期条件使用 np.where()。

首先转换日期to_datetime()。假设您的日期缺少年份 (%d-%m) 但您希望年份是 2021 年：

df['date'] = pd.to_datetime(df['date'], format='%d-%m').apply(lambda x: x.replace(year=2021))

然后使用以日期为条件的 np.where() 映射：

costs_pre = {'x': 1, 'y': 2, 'z': 3}
costs_post = {'x': 4, 'y': 2, 'z': 3}

df['costs'] = np.where(
    df['date'] < '2021-04-01',
    df['product'].map(costs_pre),
    df['product'].map(costs_post))

#         date  product  costs
# 0 2021-02-01        x      1
# 1 2021-02-01        y      2
# 2 2021-02-01        z      3
# 3 2021-03-01        x      1
# 4 2021-03-01        y      2
# 5 2021-03-01        z      3
# 6 2021-04-01        x      4
# 7 2021-04-01        y      2
# 8 2021-04-01        z      3

`np.select()`

如果有多个条件，可以嵌套 np.where()，但 np.select() 会更干净。

例如，如果您的费用在 01-03 和 01-04 再次发生变化：

costs1 = {'x': 1, 'y': 2, 'z': 3}
costs2 = {'x': 4, 'y': 2, 'z': 3}
costs3 = {'x': 100, 'y': 2, 'z': 3}

conditions = [df['date'] < '2021-03-01', df['date'] < '2021-04-01']
choices = [df['product'].map(costs1), df['product'].map(costs2)]

df['costs'] = np.select(conditions, choices, default=df['product'].map(costs3))

#         date product  costs
# 0 2021-02-01       x      1
# 1 2021-02-01       y      2
# 2 2021-02-01       z      3
# 3 2021-03-01       x      4
# 4 2021-03-01       y      2
# 5 2021-03-01       z      3
# 6 2021-04-01       x    100
# 7 2021-04-01       y      2
# 8 2021-04-01       z      3

Answer 2

熊猫在这里也很有用。 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html

映射列满足特定条件的字典中的值

2 个答案:

`np.where()`

`np.select()`