Pandas Dataframe:将包含列表的行展开到多行,并为所有列创建所需的索引

时间:2015-11-19 01:20:00

标签: python pandas

我在pandas数据帧中有时间序列数据,其中索引为测量开始时的时间,列为以固定采样率记录的值列表(连续索引的差异/列表中元素的数量)

以下是它的样子:

Time         A                   B                   .......  Z
0    [1, 2, 3, 4]      [1, 2, 3, 4]
2    [5, 6, 7, 8]      [5, 6, 7, 8]
4    [9, 10, 11, 12]   [9, 10, 11, 12]
6    [13, 14, 15, 16]  [13, 14, 15, 16 ] 
...

我想将所有列中的每一行扩展为多行,以便:

Time       A           B  .... Z
0          1           1
0.5        2           2
1          3           3
1.5        4           4
2          5           5 
2.5        6           6
.......

到目前为止,我正在考虑这些方面(代码没有意思):

def expand_row(dstruc):
    for i in range (len(dstruc)):
        for j in range (1,len(dstruc[i])):
            dstruc.loc[i+j/len(dstruc[i])] = dstruc[i][j]

    dstruc.loc[i] = dstruc[i][0]
    return dstruc

expanded = testdf.apply(expand_row)

我也尝试过使用split(',')和stack(),但我无法正确修复索引。

3 个答案:

答案 0 :(得分:4)

import numpy as np
import pandas as pd
df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')},
                  index=range(0,8,2))

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

grouped = result.groupby(level=0)
increment = (grouped.cumcount()/grouped.size())
result.index = result.index + increment
print(result)

产量

In [183]: result
Out[183]: 
       A   B   C
Time            
0.00   1   1   1
0.25   2   2   2
0.50   3   3   3
0.75   4   4   4
2.00   5   5   5
2.25   6   6   6
2.50   7   7   7
2.75   8   8   8
4.00   9   9   9
4.25  10  10  10
4.50  11  11  11
4.75  12  12  12
6.00  13  13  13
6.25  14  14  14
6.50  15  15  15
6.75  16  16  16

<强>解释

循环列表内容的一种方法是使用列表解析:

In [172]: df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')}, index=range(2,10,2))

In [173]: [(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)]
Out[173]: 
[(0, (1, 1, 1)),
 (0, (2, 2, 2)),
 ...
 (6, (15, 15, 15)),
 (6, (16, 16, 16))]

获得上述表单中的值后,您可以使用pd.DataFrame.from_items构建所需的DataFrame:

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

产量

In [175]: result
Out[175]: 
       A   B   C
Time            
2      1   1   1
2      2   2   2
...
8     15  15  15
8     16  16  16

要计算要添加到索引的增量,您可以按索引进行分组,并找到每个组的cumcountsize的比率:

In [176]: grouped = result.groupby(level=0)
In [177]: increment = (grouped.cumcount()/grouped.size())
In [179]: result.index = result.index + increment
In [199]: result.index
Out[199]: 
Int64Index([ 0.0, 0.25,  0.5, 0.75,  2.0, 2.25,  2.5, 2.75,  4.0, 4.25,  4.5,
            4.75,  6.0, 6.25,  6.5, 6.75],
           dtype='float64', name=u'Time')

答案 1 :(得分:0)

可能不理想,但可以使用groupby完成并应用一个函数,该函数返回每行的扩展DataFrame(此处假设时间差固定为2.0):

def expand(x):
    data = {c: x[c].iloc[0] for c in x if c != 'Time'}
    n = len(data['A'])
    step = 2.0 / n;
    data['Time'] = [x['Time'].iloc[0] + i*step for i in range(n)]
    return pd.DataFrame(data)

print df.groupby('Time').apply(expand).set_index('Time', drop=True)

输出:

       A   B
Time        
0.0    1   1
0.5    2   2
1.0    3   3
1.5    4   4
2.0    5   5
2.5    6   6
3.0    7   7
3.5    8   8
4.0    9   9
4.5   10  10
5.0   11  11
5.5   12  12
6.0   13  13
6.5   14  14
7.0   15  15
7.5   16  16

答案 2 :(得分:0)

说,要扩展的数据框名为cyl = 6,您可以使用df_to_expand执行以下操作。

eval

参考: covert a string which is a list into a proper list python

相关问题