使用新添加的列重新整形数据框

时间:2018-01-10 14:33:04

标签: python pandas

$customer = "(" . implode('|', array_unique(explode(' ', $customer))) . ")";

$sql = "SELECT * FROM specific_req WHERE customer REGEXP '{$customer}'";

结果

enter image description here

如何将输出设为6行,其中Cin,Cout和din,dout被视为带有a和b的2个特殊行?还有一个添加的列输入/输出。

一般情况下,我需要操纵上面的数据框以带来以下附件

enter image description here

1 个答案:

答案 0 :(得分:2)

您可以使用3

lreshape

wide_to_long

df = pd.lreshape(dff, {'c':['c1','c2'], 'd':['d1','d2']})
print (df)
   a  b  c  d
0  1  1  3  2
1  2  3  3  8
2  3  1  3  2
3  1  1  4  3
4  2  3  1  4
5  3  1  1  1

编辑:如果还希望列使用meltextractassign和最后sort_values

dff = dff.reset_index()
a = (pd.wide_to_long(dff, stubnames=['c', 'd'], i='index', j='B')
       .reset_index(drop=True)
       .reindex(columns=['a','b','c', 'd']))
print (a)
   a  b  c  d
0  1  1  3  2
1  2  3  3  8
2  3  1  3  2
3  1  1  4  3
4  2  3  1  4
5  3  1  1  1

a = dff1.melt(id_vars=['a','b'],value_vars=['cin','cout'],value_name = 'c',var_name='in/out') b = dff1.melt(id_vars=['a','b'],value_vars=['din','dout'],value_name = 'd',var_name='in/out') a['in/out'] = a['in/out'].str.extract('(in|out)', expand=False) b['in/out'] = b['in/out'].str.extract('(in|out)', expand=False) print (a) a b in/out c 0 1 1 in 3 1 2 3 in 3 2 3 1 in 3 3 1 1 out 4 4 2 3 out 1 5 3 1 out 1 print (b) a b in/out d 0 1 1 in 2 1 2 3 in 8 2 3 1 in 2 3 1 1 out 3 4 2 3 out 4 5 3 1 out 1 c = a.assign(d=b['d']).sort_values(['a','b']) #same as #c = pd.merge(a,b).sort_values(['a','b']) print (c) a b in/out c d 0 1 1 in 3 2 3 1 1 out 4 3 1 2 3 in 3 8 4 2 3 out 1 4 2 3 1 in 3 2 5 3 1 out 1 1 重写了解决方案:

pandas 0.15.0

来自wen已删除答案的另一个解决方案 - 必须replace字符串为数字,然后使用wide_to_long,最后map返回:

a=pd.melt(dff1,id_vars=['a','b'],value_vars=['cin','cout'],value_name='c',var_name='in/out') 
b=pd.melt(dff1,id_vars=['a','b'],value_vars=['din','dout'],value_name='d',var_name='in/out')
a['in/out'] = a['in/out'].str.extract('(in|out)')
b['in/out'] = b['in/out'].str.extract('(in|out)')
c = pd.merge(a,b).sort_values(['a','b'])

编辑:

对于还原过程使用:

#define columns
L = ['in','out']
d = dict(enumerate(L))
d1 = {v: str(k) for k, v in d.items()}
print (d)
{0: 'in', 1: 'out'}

print (d1)
{'out': '1', 'in': '0'}

dff1.columns = dff1.columns.to_series().replace(d1,regex=True)
a = pd.wide_to_long(dff1, stubnames=['c', 'd'], j='in/out', i=['a','b']).reset_index()
a['in/out'] = a['in/out'].astype(int).map(d)
a = a[['a','b','c','d','in/out']]
print (a)
   a  b  c  d in/out
0  1  1  3  2     in
1  1  1  4  3    out
2  2  3  3  8     in
3  2  3  1  4    out
4  3  1  3  2     in
5  3  1  1  1    out