Question

我有一个数据框，其中一列包含不规则的元组列表。元组都将具有相同的长度，只是列表不均匀。我想在框架中融化此列，以便将新列追加到现有列中，并复制行。像这样：

df
   name     id       list_of_tuples
0  john doe    abc-123  [('cat',100,'xyz-123'),('cat',96,'uvw-456')]
1  bob smith    def-456  [('dog',98,'rst-789'),('dog',97,'opq-123'),('dog',95,'lmn-123')]
2  bob parr    ghi-789  [('tree',100,'ijk-123')]

df_new
   name            id       val_1 val_2 val_3
0  john doe        abc-123  cat   100   xyz-123
1  john doe        abc-123  cat   96    uvw-456
2  bob smith       def-456  dog   98    rst-789
3  bob smith       def-456  dog   97    opq-123
4  violet parr     def-456  dog   95    lmn-123
5  violet parr     ghi-789  tree  100   ijk-123

对于我当前的方法，我正在创建一个新的数据框，其中使用了itertools的链功能，但是我想摆脱创建另一个数据框并将其重新合并到“ id”列中的麻烦。

这是我当前的代码：

df_new = pd.DataFrame(list(chain.from_iterable(df.matches)),columns=['val_1','val_2','val_3']).reset_index(drop=True)
df_new['id'] = np.repeat(df.id.values, df['list_of_tuples'].str.len())

Answer 1

嵌套您的列表，然后我们进行concat

s=df.list_of_tuples
pd.concat([pd.DataFrame({'id':df.id.repeat(s.str.len())}).reset_index(drop=True),pd.DataFrame(np.concatenate(s.values))],axis=1)
Out[118]: 
        id     0    1        2
0  abc-123   cat  100  xyz-123
1  abc-123   cat   96  uvw-456
2  def-456   dog   98  rst-789
3  def-456   dog   97  opq-123
4  def-456   dog   95  lmn-123
5  ghi-789  tree  100  ijk-123

Answer 2

让apply与pd.Series一起使用：

df.set_index('id').list_of_tuples  #Set id as index and select list_of_tuples column
  .apply(pd.Series)                #apply pd.series to separate elements of list 
  .stack()                         #stack the elements vertically
  .apply(pd.Series)                #apply pd.Series to separate elements of tuples
  .add_prefix('val_')              #add prefix of val_ to all columns
  .reset_index()                   #Reset index to move id back into frame as column
  .drop('level_1', axis=1)         #Drop not need level_1 column from stack

输出：

        id val_0  val_1    val_2
0  abc-123   cat    100  xyz-123
1  abc-123   cat     96  uvw-456
2  def-456   dog     98  rst-789
3  def-456   dog     97  opq-123
4  def-456   dog     95  lmn-123
5  ghi-789  tree    100  ijk-123

已编辑以处理向数据框添加“名称”的问题编辑：

df.set_index(['name','id']).list_of_tuples
  .apply(pd.Series)
  .stack()
  .apply(pd.Series)
  .add_prefix('val_')
  .reset_index(level=-1,drop=True)
  .reset_index()

输出：

        name       id val_0  val_1    val_2
0   John Doe  abc-123   cat    100  xyz-123
1   John Doe  abc-123   cat     96  uvw-456
2  Bob Smith  def-456   dog     98  rst-789
3  Bob Smith  def-456   dog     97  opq-123
4  Bob Smith  def-456   dog     95  lmn-123
5   Bob Parr  ghi-789  tree    100  ijk-123

将具有元组列表的列转换为许多列

2 个答案: