pandas有效地将列名转换为变量

时间:2015-06-15 17:20:10

标签: python pandas

我确实有以下数据框:

{'2003-12-02LVDT0023': {0: 2.3407617000000001e-06,
  1: 2.3402380999999998e-06,
  2: 2.3410341000000001e-06,
  3: 2.3417209999999999e-06,
  4: 2.3419282000000002e-06,
  5: 2.3420178e-06,
  6: 2.3424012999999999e-06},
 '2003-12-02LVDT0024': {0: 2.3612594999999998e-06,
  1: 2.3609533999999999e-06,
  2: 2.3611187000000002e-06,
  3: 2.3618049e-06,
  4: 2.3621773999999998e-06,
  5: 2.3626039000000002e-06,
  6: 2.3625455000000001e-06},
 '2003-12-02LVDT0025': {0: 2.3660825000000001e-06,
  1: 2.3660903000000001e-06,
  2: 2.3659481000000001e-06,
  3: 2.3661921e-06,
  4: 2.3668378999999998e-06,
  5: 2.3671985e-06,
  6: 2.3679653999999999e-06},
 '2003-12-02force0023': {0: 2.3664842999999999e-06,
  1: 2.3664650000000002e-06,
  2: 2.3666738999999999e-06,
  3: 2.3665972999999999e-06,
  4: 2.3670195e-06,
  5: 2.3675174999999997e-06,
  6: 2.3677449e-06},
 '2003-12-02force0024': {0: 2.3680921e-06,
  1: 2.3682342000000004e-06,
  2: 2.3684212999999998e-06,
  3: 2.3688697000000001e-06,
  4: 2.3694958999999999e-06,
  5: 2.3698856000000002e-06,
  6: 2.3702362000000002e-06},
 '2003-12-02force0025': {0: 2.3684941000000001e-06,
  1: 2.3691163999999997e-06,
  2: 2.3693348999999999e-06,
  3: 2.3694661000000002e-06,
  4: 2.3701970999999998e-06,
  5: 2.3704627000000002e-06,
  6: 2.3707437000000001e-06}}

enter image description here

我想以某种方式重塑数据框,我确实每个数据点都有一列(标题中的最后一位),每个数据点有两列(lvdt和{{} 1}})。数据框本身有40000行。

这些微小数据的部分原因是:

force

enter image description here

这给了我列的最后四位数作为新列。从这里开始,我可能会以某种方式重塑数据帧。但是在原始数据帧上执行此操作会产生一个新的数据帧,其中包含15640000行和额外的1 GB内存。

我想要的是如下的数据框:

# I cannot use `inplace=True` here
new = new.unstack().reset_index()
new['id'] = new.level_0.str[-4:]
new = new.convert_objects(convert_numeric=True)
new

1 个答案:

答案 0 :(得分:0)

这应该让您至少接近您想要的数据帧:

  1. 将列索引替换为分层索引:

    ind = [(t[0:10], t[10:-4], t[-2:]) for t in df.columns]
    newcol = pd.MultiIndex.from_tuples(ind, names = ['date', 'factor', 'id'])
    df.columns = newcol
    
  2. 使用stack后跟reset_indexdateid列标签转换为列:

    df = df.stack(level=['date', 'id']).reset_index([1,2])
    df.index = range(len(df))
    
  3. 最后一行为您提供唯一索引。你当然也可以使用有意义的东西。