Question

我有一张表：

country | name  | medals_won | year
-----------------------------------
US      | sarah |      1     | 2010
US      | sarah |      2     | 2011
US      | sarah |      5     | 2015
US      | alice |      3     | 2010
US      | alice |      4     | 2012
US      | alice |      1     | 2015
AU      | jones |      2     | 2013
AU      | jones |      8     | 2015

我想要它：

country | name  | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
---------------------------------------------------------
US      | sarah | 1    | 2    | 0    | 0    | 0    | 5
US      | alice | 3    | 0    | 4    | 0    | 0    | 1
AU      | jones | 0    | 0    | 0    | 2    | 0    | 8

我使用df.apply进行修改，甚至是强力迭代，但您可能会猜到棘手的部分是这些行值不是严格顺序的，所以这不是一个简单的转置操作（例如，没有人在2014年赢得任何奖牌，但我希望得到的表格显示在一个满是零的列中。）

Answer 1

您可以使用set_index + unstack：

df = df.set_index(['country','name','year'])['medals_won'].unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones     0     0     0     2     8
US      alice     3     0     4     0     1
        sarah     1     2     0     0     5

如果重复项需要汇总，例如mean，sum ...与pivot_table或groupby + aggregate function + unstack：

print (df)
  country   name  medals_won  year
0      US  sarah           1  2010 <-same US  sarah 2010, different 1
1      US  sarah           4  2010 <-same US  sarah 2010, different 4
2      US  sarah           2  2011
3      US  sarah           5  2015
4      US  alice           3  2010
5      US  alice           4  2012
6      US  alice           1  2015
7      AU  jones           2  2013
8      AU  jones           8  2015

df = df.pivot_table(index=['country','name'], 
                    columns='year', 
                    values='medals_won', 
                    fill_value=0, 
                    aggfunc='mean')
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0     0     0     2     8
US      alice   3.0     0     4     0     1
        sarah   2.5     2     0     0     5 <- (1+4)/2 = 2.5

可替换地：

df = df.groupby(['country','name','year'])['medals_won'].mean().unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0   0.0   0.0   2.0   8.0
US      alice   3.0   0.0   4.0   0.0   1.0
        sarah   2.5   2.0   0.0   0.0   5.0

最后：

df = df.reset_index().rename_axis(None, axis=1)
print (df)
  country   name  2010  2011  2012  2013  2015
0      AU  jones     0     0     0     2     8
1      US  alice     3     0     4     0     1
2      US  sarah     1     2     0     0     5

Answer 2

您可以使用 pandas 的 pivot_table（）功能，并使用零填充 nan 值> pd.fillna（0）

    df = pd.DataFrame({
        'country' : pd.Series(['US', 'US', 'US', 'US', 'US', 'US', 'AU', 'AU']),
        'name' : pd.Series(['sarah', 'sarah','sarah','alice','alice','alice','jones','jones']),
        'medals_won' : pd.Series([1,2,5,3,4,1,2,8]),
        'year': pd.Series([2010,2011,2015,2010,2012,2015,2013,2015])    
        })
    pd.pivot_table(df, index=['country','name'], columns='year', aggfunc='sum').fillna(0)

my output

python pandas：将列转换为行

2 个答案: