在某些条件下与上面的行组合

时间:2017-10-12 15:26:50

标签: python pandas

表格式(空单元格为空,列为:字段,维度)

field | dimension
-----------------
a     | 
b     | abc
e     | efg
      | xyz
r     | abc
      | def
      | xyz

所需格式:

field | dimension
-----------------
a     | [nan]
b     | [abc]
e     | [efg, xyz]
r     | [abc, def, xyz]

我试过了:

df.dimension = [df.dimension]

并且要在字段中找到每个空单元格的索引并与上面的行组合。但是,我得到了 -

  

ValueError:值的长度与索引的长度不匹配。

我还认为必须有比我接近它更好的方式。提前致谢

2 个答案:

答案 0 :(得分:2)

使用:

df =(df.groupby(df['field'].ffill())['dimension']
       .apply(lambda x: np.nan if x.isnull().all() else list(x))
       .reset_index())
print (df)
  field        dimension
0     a              NaN
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]
df = (df[df['dimension'].notnull()].groupby(df['field'].ffill())['dimension']
                                  .apply(list)
                                  .reindex(pd.unique(df['field'].dropna()))
                                  .reset_index())
print (df)
  field        dimension
0     a              NaN
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]

但如果列表中NaN没有问题:

df =(df.groupby(df['field'].ffill())['dimension']
       .apply(list)
       .reset_index())
print (df)
  field        dimension
0     a            [nan]
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]

答案 1 :(得分:1)

让我们试试:

df['field'] = df['field'].ffill()
df_out = df.groupby('field')['dimension'].apply(list).reset_index()

输出:

  field        dimension
0     a            [nan]
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]