通过合并列名和其他列值来创建新列

时间:2018-05-02 04:12:24

标签: python pandas

尝试在DF1中创建一个新列,列出该队伍的主队数量。

DF1

                     Date             Visitor  V_PTS                  Home  H_PTS  \
0 2012-10-30 19:00:00  Washington Wizards     84   Cleveland Cavaliers     94   
1 2012-10-30 19:30:00    Dallas Mavericks     99    Los Angeles Lakers     91   
2 2012-10-30 20:00:00      Boston Celtics    107            Miami Heat    120   
3 2012-10-31 19:00:00    Dallas Mavericks     94             Utah Jazz    113   
4 2012-10-31 19:00:00   San Antonio Spurs     99  New Orleans Pelicans     95   

   Attendance                    Arena                 Location  Capacity  \
0       20562      Quicken Loans Arena          Cleveland, Ohio     20562   
1       18997           Staples Center  Los Angeles, California     18997   
2       20296  American Airlines Arena           Miami, Florida     19600   
3       17634  Vivint Smart Home Arena     Salt Lake City, Utah     18303   
4       15358     Smoothie King Center   New Orleans, Louisiana     16867   

  Yr Arena Opened   Season  
0            1994  2012-13  
1            1992  2012-13  
2            1999  2012-13  
3            1991  2012-13  
4            1999  2012-13 

DF2

                           2012-13  2013-14  2014-15  2015-16  2016-17
Cleveland Cavaliers           1        1        2        1        3
Los Angeles Lakers            2        1        1        1        0
Miami Heat                    3        3        2        2        1
Chicago Bulls                 2        1        2        2        1
Detroit Pistons               0        0        0        1        1
Los Angeles Clippers          2        2        2        1        1
New Orleans Pelicans          0        1        1        1        1
Philadelphia 76ers            1        0        0        0        0
Phoenix Suns                  0        0        0        0        0
Portland Trail Blazers        1        2        2        0        0
Toronto Raptors               0        1        1        2        2

DF1['H_Allstars']=DF2[DF1['Season'],DF1['Home']])

导致TypeError:'Series'对象是可变的,因此它们不能被散列

我理解错误只是不确定如何做到这一点。

2 个答案:

答案 0 :(得分:0)

您可以使用pandas.melt。将您的数据df2转换为长格式,即Home和Season作为列,Allstars作为值,然后合并到df1 on' Home'和'季节'。

import pandas as pd
df2['Home'] = df2.index
df2 = pd.melt(df2, id_vars = 'Home', value_vars = ['2012-13',  '2013-14', '2014-15', '2015-16', '2016-17'], var_name = 'Season', value_name='H_Allstars')
df = df1.merge(df2, on=['Home','Season'], how='left') 

答案 1 :(得分:0)

我删除了额外的列,只关注必要的列进行演示。

输入:

<强> DF1

                      Home  2012-13  2013-14  2014-15  2015-16  2016-17
0      Cleveland Cavaliers        1        1        2        1        3
1       Los Angeles Lakers        2        1        1        1        0
2               Miami Heat        3        3        2        2        1
3            Chicago Bulls        2        1        2        2        1
4          Detroit Pistons        0        0        0        1        1
5     Los Angeles Clippers        2        2        2        1        1
6     New Orleans Pelicans        0        1        1        1        1
7       Philadelphia 76ers        1        0        0        0        0
8             Phoenix Suns        0        0        0        0        0
9   Portland Trail Blazers        1        2        2        0        0
10         Toronto Raptors        0        1        1        2        2

<强> DF2

              Visitor                  Home   Season
0  Washington Wizards   Cleveland Cavaliers  2012-13
1    Dallas Mavericks    Los Angeles Lakers  2012-13
2      Boston Celtics            Miami Heat  2012-13
3    Dallas Mavericks             Utah Jazz  2012-13
4   San Antonio Spurs  New Orleans Pelicans  2012-13

第1步:融化df1以获取allstars列

df3 = pd.melt(df1, id_vars='Home', value_vars = df1.columns[df.columns.str.contains('20')], var_name = 'Season', value_name='H_Allstars')

输出继电器:

                      Home   Season   H_Allstars
0      Cleveland Cavaliers  2012-13           1
1       Los Angeles Lakers  2012-13           2
2               Miami Heat  2012-13           3
3            Chicago Bulls  2012-13           2
4          Detroit Pistons  2012-13           0
5     Los Angeles Clippers  2012-13           2
6     New Orleans Pelicans  2012-13           0
7       Philadelphia 76ers  2012-13           1
8             Phoenix Suns  2012-13           0
...

第2步:将此新数据框与df2合并以获取H_Allstars和V_Allstars列

df4 = pd.merge(df2, df3, how='left', on=['Home', 'Season'])

输出:

              Visitor                  Home   Season  H_Allstars
0  Washington Wizards   Cleveland Cavaliers  2012-13         1.0
1    Dallas Mavericks    Los Angeles Lakers  2012-13         2.0
2      Boston Celtics            Miami Heat  2012-13         3.0
3    Dallas Mavericks             Utah Jazz  2012-13         NaN
4   San Antonio Spurs  New Orleans Pelicans  2012-13         0.0

第3步:添加V_Allstars列

# renaming column as required
df3.rename(columns={'Home': 'Visitor', 'H_Allstars': 'V_Allstars'}, inplace=True)

df5 = pd.merge(df4, df3, how='left', on=['Visitor', 'Season'])

输出:

              Visitor                  Home   Season  H_Allstars  V_Allstars
0  Washington Wizards   Cleveland Cavaliers  2012-13         1.0         NaN
1    Dallas Mavericks    Los Angeles Lakers  2012-13         2.0         NaN
2      Boston Celtics            Miami Heat  2012-13         3.0         NaN
3    Dallas Mavericks             Utah Jazz  2012-13         NaN         NaN
4   San Antonio Spurs  New Orleans Pelicans  2012-13         0.0         NaN