pd.sort_values没有做它应该做的事情

时间:2017-08-07 12:21:28

标签: python pandas csv

我有一个我已经导入的csv文件 df = pd.read_csv("af.csv")

CSV文件如下所示(预览):

"match_id","start_time","win","leaguename","opposing_team","team","min"
2992096687,1486840800,True,"CaptainsDraft",3729377,2642171,1453382256
2992217489,1486845476,true,"Captains Draft",3729377,2642171,1453382256
2994454005,1486926905,false,"Captains Draft",2586976,2642171,1453382256
2659805546,1474478411,false,"BTSSeries",55,2642171,1454281287
2659879628,1474481141,false,"BTSSeries",55,2642171,1454281287
2661783205,1474563571,false,"BTSSeries",2537636,2642171,1454281287
2661875544,1474566865,false,"BTSSeries",2537636,2642171,1454281287
2662027296,1474573160,true,"BTSSeries",59,2642171,1454281287
2758086417,1478352060,true,"ESLManila16",2163,2642171,1454692269
2758241073,1478355547,true,"ESLManila16",2163,2642171,1454692269
2747710178,1477941012,false,"ESLFrankfurt16",2850016,2642171,1459782261
2747808587,1477945318,true,"ESLFrankfurt16",2850016,2642171,1459782261
2747861268,1477947994,true,"ESLFrankfurt16",2850016,2642171,1459782261

现在我要做的就是保持联赛的第一场比赛,然后是所有比赛的胜利次数(真是赢,假是亏损) 联盟然后按start_time排序

我有以下代码来执行此操作:

df1 = df.groupby(['leaguename', 'team']).sum().reset_index()
df1 = df1[['win','leaguename','team']]

df2 = df.sort_values("start_time").groupby("leaguename", as_index=False).first()
df2 = df2[['leaguename', 'start_time']]

output = pd.merge(df1, df2, 'inner', on = 'leaguename')

输出返回jumbled unordered start_time:

,win,leaguename,team,start_time
0,5.0,ASUSROGSeason6,2642171,1478022101
1,6.0,CaptainsDraft,2642171,1486840800
2,3.0,Dota2Asia17,2642171,1486130597
3,2.0,DotaPitSeason5,2642171,1476903919
4,5.0,ESLFrankfurt16,2642171,1477941012
5,2.0,ESLManila16,2642171,1478352060
6,6.0,GlobalGrandMasters,2642171,1466176095
7,4.0,NanyangChampionshipsSeason2,2642171,1464178206

期望的输出:

,win,leaguename,team,start_time
0,4.0,NanyangChampionshipsSeason2,2642171,1464178206
1,6.0,GlobalGrandMasters,2642171,1466176095
2,2.0,DotaPitSeason5,2642171,1476903919
3,5.0,ESLFrankfurt16,2642171,1477941012
4,5.0,ASUSROGSeason6,2642171,1478022101
5,2.0,ESLManila16,2642171,1478352060
6,3.0,Dota2Asia17,2642171,1486130597
7,6.0,CaptainsDraft,2642171,1486840800

如何实现所需的输出?

1 个答案:

答案 0 :(得分:0)

对于默认的唯一单调索引,我认为您需要start_timedrop=True DataFrame.sort_values和参数output = output.sort_values('start_time').reset_index(drop=True) #data by output sample print (output) win leaguename team start_time 0 4.0 NanyangChampionshipsSeason2 2642171 1464178206 1 6.0 GlobalGrandMasters 2642171 1466176095 2 2.0 DotaPitSeason5 2642171 1476903919 3 5.0 ESLFrankfurt16 2642171 1477941012 4 5.0 ASUSROGSeason6 2642171 1478022101 5 2.0 ESLManila16 2642171 1478352060 6 3.0 Dota2Asia17 2642171 1486130597 7 6.0 CaptainsDraft 2642171 1486840800

sort=False

另一种解决方案是将groupby添加到df1 = df.groupby(['leaguename', 'team'], sort=False).sum().reset_index() df1 = df1[['win','leaguename','team']] df2 = df.sort_values("start_time").groupby("leaguename", as_index=False, sort=False).first() df2 = df2[['leaguename', 'start_time']] output = pd.merge(df1, df2, on = 'leaguename') #data by input sample print (output) win leaguename team start_time 0 2.0 Captains Draft 2642171 1486840800 1 1.0 BTSSeries 2642171 1474478411 2 2.0 ESLManila16 2642171 1478352060 3 2.0 ESLFrankfurt16 2642171 1477941012

{{1}}