汇总数据并获得总和和计数

时间:2015-12-26 21:06:37

标签: python pandas group-by aggregate

我在python中有一个有很多行的对象:

INPUT:

    Team1     Player1     idTrip13     133
    Team2     Player333   idTrip10     18373
    Team3     Player22    idTrip12     17338899
    Team2     Player293   idTrip02     17656
    Team3     Player20    idTrip11     1883
    Team1     Player1     idTrip19     19393

我需要聚合这些数据(如数据透视表)。

OUTPUT我正在努力:

Team1   Player1 : 2 trips : sum(133+19393)
Team2   Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3   Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883

有人可以建议使用Python中的相应对象,以便我可以使用以下输出吗?

print team, player, trips, time

1 个答案:

答案 0 :(得分:8)

pandas DataFrames

使用groupby功能
  1. 将您的数据放入列表列表中,每个内部列表都是数据框中的一行。

    In[1]:
    
    mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
    ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], 
    ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
    
    df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
    
    df
    Out[1]:
         team    player       trips      time
    0   Team1   Player1     idTrip13    133
    1   Team2   Player333   idTrip10    18373
    2   Team3   Player22    idTrip12    17338899
    3   Team2   Player293   idTrip02    17656
    4   Team3   Player20    idTrip11    1883
    5   Team1   Player1     idTrip19    19393
    
  2. Call groupby(),传递您希望用作石斑鱼的列, 并将功能应用于组。

  3. 实施例

    <强>实施例。 1 查找每个团队进行的旅行次数。 team是石斑鱼,我们在count()列上应用['trips']函数。

    In[2]:
    trip_count = df.groupby(by = ['team'])['trips'].count() 
    
    trip_count              
    Out[2]:          
    
     team
    Team1    2
    Team2    2
    Team3    2
    Name: trips, dtype: int64
    

    <强>实施例。 2(多列):查找团队中每位玩家所花费的总时间。我们使用2列['team', 'player']作为分组,并在sum()列上应用函数['time']

    In[3]:              
    trip_time = df.groupby(by = ['team', 'player'])['time'].sum() 
    
    trip_time        
    Out[3]:
    
     team   player   
    Team1  Player1         19526
    Team2  Player293       17656
           Player333       18373
    Team3  Player20         1883
           Player22     17338899
    Name: time, dtype: int64
    

    <强>实施例。 3 (multiple functions) :对于团队中的每位玩家,查找旅行总次数和旅行总时间。

    player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})
    
    player_total
    Out[4]:
                     trips  time
    team    player      
    Team1   Player1     2   19526
    Team2   Player293   1   17656
            Player333   1   18373
    Team3   Player20    1   1883
            Player22    1   17338899