将具有元组列表的列转换为许多列

时间:2018-06-25 19:15:19

标签: python list pandas tuples

我有一个数据框,其中一列包含不规则的元组列表。元组都将具有相同的长度,只是列表不均匀。我想在框架中融化此列,以便将新列追加到现有列中,并复制行。像这样:

df
   name     id       list_of_tuples
0  john doe    abc-123  [('cat',100,'xyz-123'),('cat',96,'uvw-456')]
1  bob smith    def-456  [('dog',98,'rst-789'),('dog',97,'opq-123'),('dog',95,'lmn-123')]
2  bob parr    ghi-789  [('tree',100,'ijk-123')]

df_new
   name            id       val_1 val_2 val_3
0  john doe        abc-123  cat   100   xyz-123
1  john doe        abc-123  cat   96    uvw-456
2  bob smith       def-456  dog   98    rst-789
3  bob smith       def-456  dog   97    opq-123
4  violet parr     def-456  dog   95    lmn-123
5  violet parr     ghi-789  tree  100   ijk-123

对于我当前的方法,我正在创建一个新的数据框,其中使用了itertools的链功能,但是我想摆脱创建另一个数据框并将其重新合并到“ id”列中的麻烦。

这是我当前的代码:

df_new = pd.DataFrame(list(chain.from_iterable(df.matches)),columns=['val_1','val_2','val_3']).reset_index(drop=True)
df_new['id'] = np.repeat(df.id.values, df['list_of_tuples'].str.len()) 

2 个答案:

答案 0 :(得分:2)

嵌套您的列表,然后我们进行concat

s=df.list_of_tuples
pd.concat([pd.DataFrame({'id':df.id.repeat(s.str.len())}).reset_index(drop=True),pd.DataFrame(np.concatenate(s.values))],axis=1)
Out[118]: 
        id     0    1        2
0  abc-123   cat  100  xyz-123
1  abc-123   cat   96  uvw-456
2  def-456   dog   98  rst-789
3  def-456   dog   97  opq-123
4  def-456   dog   95  lmn-123
5  ghi-789  tree  100  ijk-123

答案 1 :(得分:1)

applypd.Series一起使用:

df.set_index('id').list_of_tuples  #Set id as index and select list_of_tuples column
  .apply(pd.Series)                #apply pd.series to separate elements of list 
  .stack()                         #stack the elements vertically
  .apply(pd.Series)                #apply pd.Series to separate elements of tuples
  .add_prefix('val_')              #add prefix of val_ to all columns
  .reset_index()                   #Reset index to move id back into frame as column
  .drop('level_1', axis=1)         #Drop not need level_1 column from stack

输出:

        id val_0  val_1    val_2
0  abc-123   cat    100  xyz-123
1  abc-123   cat     96  uvw-456
2  def-456   dog     98  rst-789
3  def-456   dog     97  opq-123
4  def-456   dog     95  lmn-123
5  ghi-789  tree    100  ijk-123

已编辑以处理向数据框添加“名称”的问题编辑:

df.set_index(['name','id']).list_of_tuples
  .apply(pd.Series)
  .stack()
  .apply(pd.Series)
  .add_prefix('val_')
  .reset_index(level=-1,drop=True)
  .reset_index()

输出:

        name       id val_0  val_1    val_2
0   John Doe  abc-123   cat    100  xyz-123
1   John Doe  abc-123   cat     96  uvw-456
2  Bob Smith  def-456   dog     98  rst-789
3  Bob Smith  def-456   dog     97  opq-123
4  Bob Smith  def-456   dog     95  lmn-123
5   Bob Parr  ghi-789  tree    100  ijk-123