重塑数据框:数据透视,堆栈还是分组方式?

时间:2018-12-28 05:05:36

标签: python pandas

我有这个数据框:

Playlist    Track Name    Spotify Uri               Playlist Uri
microhouse  make a move   5nUS4bSN0cFZB0knxyM4LZ    1d4gyZxan7lK9KqYU2EJ    
microhouse  mango         2f8eSlsreAHHzJ5SPkpYLf    1d4gyZxan7lK9KqYU2EJ    
attlas      ryat          3McvalY1RDYczyDmixyAwQ    2CInjKguWauO29QB21Co
attlas      further       4qEUN1lON8UjnUiOZc39ID    2CInjKguWauO29QB21Co

我希望它看起来像这样:

Playlist         microhouse                         attlas      
Playlist Uri     1d4gyZxan7lK9KqY                   2CInjKguWauO29Q                      
                 Track Name      Spotify Uri        Track Name   Spotify Uri  
                 make a move     5nUS4bSN0cFZB0kn   ryat         3valY1RDYc
                 mango           2f8eSlsreAHHzJ5S   further      4qEUN1lON

我已经使用数据透视为每个播放列表和该播放列表中的所有曲目名称生成一列,但是我不知道如何使用多重索引(播放列表和播放列表URI),没有聚合以及两个值列(跟踪名称和Spotify URI)。 Stack并没有真正做到我想要的。感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

您可以在cumcount的列中为新index创建3级MultiIndex,并在set_indexunstack中创建三级MultiIndex,必要时最后按sort_index排序第二级,通过reorder_levels进行更改级别排序,还可以通过reindex进行更改排序:

g = df.groupby(['Playlist','Playlist Uri']).cumcount()
df = (df.set_index([g, 'Playlist','Playlist Uri'])
        .unstack([1,2])
        .sort_index(axis=1, level=1)
        .reorder_levels([1,2,0], axis=1)
        .reindex(['Track Name','Spotify Uri'], axis=1, level=2))
print (df)
Playlist                   attlas                          \
Playlist Uri 2CInjKguWauO29QB21Co                           
                       Track Name             Spotify Uri   
0                            ryat  3McvalY1RDYczyDmixyAwQ   
1                         further  4qEUN1lON8UjnUiOZc39ID   

Playlist               microhouse                          
Playlist Uri 1d4gyZxan7lK9KqYU2EJ                          
                       Track Name             Spotify Uri  
0                     make a move  5nUS4bSN0cFZB0knxyM4LZ  
1                           mango  2f8eSlsreAHHzJ5SPkpYLf  

print (df.columns)
MultiIndex(levels=[['attlas', 'microhouse'], 
                   ['1d4gyZxan7lK9KqYU2EJ', '2CInjKguWauO29QB21Co'], 
                   ['Track Name', 'Spotify Uri']],
           labels=[[0, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 1]],
           names=['Playlist', 'Playlist Uri', None])