如何在多索引熊猫的前几行上应用交集?

时间:2019-03-01 11:10:29

标签: python pandas dataframe intersection

这是我的多重索引:

pd.DataFrame({'category':['A','A','A','B','B','B'],
              'row':[1,2,3,1,2,3],
              'unique':[{0,1,2},{2,3,4},{1,5,6},{0,1,2},{3,4,5},{4,5,6}],
              'new':[{0,1,2},{3,4},{5,6},{0,1,2},{3,4,5},{6}]}).set_index(['category','row'])

看起来像这样:

Category  row  unique    new      
A          1   {0,1,2}  {0,1,2}
           2   {2,3,4}    {3,4}
           3   {1,5,6}    {5,6}   

B          1   {0,1,2}  {0,1,2}
           2   {3,4,5}  {3,4,5}
           3   {4,5,6}      {6}

我正在尝试应用类似 A.1 ['new'] intersect A.2['unique']

预期结果:

Category  row  unique    new      Previous Row Returned
A          1   {0,1,2}  {0,1,2}          None
           2   {2,3,4}    {3,4}           {2}
           3   {1,5,6}    {5,6}            {}

B          1   {0,1,2}  {0,1,2}          None
           2   {3,4,5}  {3,4,5}            {}
           3   {4,5,6}      {6}         {4,5}

我该如何处理?

1 个答案:

答案 0 :(得分:0)

在熊猫中没有标量的工作应该很慢,但是如果需要的话:

#shift values per groups 
df['Previous Row Returned'] = df.groupby(level=0)['new'].shift()
#boolean mask - working only for not missing values
mask = df['Previous Row Returned'].notnull()
#get intersection
f = lambda x: x['unique'].intersection(x['Previous Row Returned'])
df.loc[mask, 'Previous Row Returned'] = df.loc[mask].apply(f, axis=1)
print (df)
                 unique        new Previous Row Returned
Category row                                            
A        1    {0, 1, 2}  {0, 1, 2}                   NaN
         2    {2, 3, 4}     {3, 4}                   {2}
         3    {1, 5, 6}     {5, 6}                    {}
B        1    {0, 1, 2}  {0, 1, 2}                   NaN
         2    {3, 4, 5}  {3, 4, 5}                    {}
         3    {4, 5, 6}        {6}                {4, 5}