Question

（python版本3.7.3）我有一个包含3列的数据框：区域，渠道和Annual_contacts。下面是一些虚拟数据，在这里我想按区域和通道分组并计算标准偏差。我可以通过创建一个新的数据框（groupby和apply），然后将其与原始数据框合并来做到这一点，但是我读到使用groupby＆transform是一种更快，更干净的方法。可悲的是，来自groupby / apply和groupby / transform的数字是不同的（apply是正确的，transform是错误的）。谁能指出我的转换语法出了什么问题？

    import pandas as pd
    import numpy as np

    contacts = pd.DataFrame({
         'region':['rg1', 'rg1', 'rg1', 'rg1', 'rg1', 'rg1', 'rg2', 'rg2', 'rg2']
         ,'channel': ['ch1', 'ch1', 'ch1', 'ch1', 'ch1', 'ch1', 'ch1', 'ch1', 'ch1']
         , 'yearly_contacts' : [8, 16, 16, 50, 50, 4, 15, 20, 5]
         })

    # These are the numbers I expect:
    expected_stdev = contacts.groupby(['region', 'channel'])'yearly_contacts'].apply(np.std).reset_index()

    # But I want them directly added as 4th column
    contacts['actual_stdev'] = contacts.groupby(['region', 'channel'])["yearly_contacts"].transform(np.std)
    # It works, but why are the numbers different?

分组/具有标准偏差的变换

0 个答案: