Pandas按索引值划分数据帧

时间:2016-08-04 10:09:24

标签: python pandas indexing dataframe

我试图用索引划分数据框中的所有列。(1221行,1000列)

           5000058004097  5000058022936  5000058036940  5000058036827  \

91.0        3.667246e+10   3.731947e+12   2.792220e+14   2.691262e+13   
94.0        9.869027e+10   1.004314e+13   7.514220e+14   7.242529e+13   
96.0        2.536914e+11   2.581673e+13   1.931592e+15   1.861752e+14
...

这是我试过的代码......

A = SHIGH.divide(SHIGH.index, axis =1) 

我收到此错误:

ValueError: operands could not be broadcast together with shapes (1221,1000) (1221,) 

我也试过

A = SHIGH.divide(SHIGH.index.values.tolist(), axis =1)

并重新索引并使用列来划分并获得相同的错误。

如果有人可以请指出我的错误,我将不胜感激。

4 个答案:

答案 0 :(得分:1)

您需要将Index对象转换为Series

df.div(df.index.to_series(), axis=0)

示例:

In [118]:
df = pd.DataFrame(np.random.randn(5,3))
df

Out[118]:
          0         1         2
0  0.828540 -0.574005 -0.535122
1 -0.126242  2.152599 -1.356933
2  0.289270 -0.663178 -0.374691
3 -0.016866 -0.760110 -1.696402
4  0.130580 -1.043561  0.789491

In [124]:
df.div(df.index.to_series(), axis=0)

Out[124]:
          0         1         2
0       inf      -inf      -inf
1 -0.126242  2.152599 -1.356933
2  0.144635 -0.331589 -0.187345
3 -0.005622 -0.253370 -0.565467
4  0.032645 -0.260890  0.197373

答案 1 :(得分:1)

您需要转换索引to_series,然后除以div

print (SHIGH.divide(SHIGH.index.to_series(), axis = 0))
      5000058004097  5000058022936  5000058036940  5000058036827
91.0   4.029941e+08   4.101041e+10   3.068374e+12   2.957431e+11
94.0   1.049896e+09   1.068419e+11   7.993851e+12   7.704818e+11
96.0   2.642619e+09   2.689243e+11   2.012075e+13   1.939325e+12

在两个解决方案中timings都相同:

SHIGH = pd.DataFrame({'5000058022936': {96.0: 25816730000000.0, 91.0: 3731947000000.0, 94.0: 10043140000000.0}, 
                 '5000058036940': {96.0: 1931592000000000.0, 91.0: 279222000000000.0, 94.0: 751422000000000.0}, 
                 '5000058036827': {96.0: 186175200000000.0, 91.0: 26912620000000.0, 94.0: 72425290000000.0}, 
                 '5000058004097': {96.0: 253691400000.0, 91.0: 36672460000.0, 94.0: 98690270000.0}})


print (SHIGH)
      5000058004097  5000058022936  5000058036827  5000058036940
91.0   3.667246e+10   3.731947e+12   2.691262e+13   2.792220e+14
94.0   9.869027e+10   1.004314e+13   7.242529e+13   7.514220e+14
96.0   2.536914e+11   2.581673e+13   1.861752e+14   1.931592e+15

#[1200 rows x 1000 columns] in sample DataFrame
SHIGH = pd.concat([SHIGH]*400).reset_index(drop=True)
SHIGH = pd.concat([SHIGH]*250, axis=1)

In [212]: %timeit (SHIGH.divide(SHIGH.index.values, axis = 0))
100 loops, best of 3: 14.8 ms per loop

In [213]: %timeit (SHIGH.divide(SHIGH.index.to_series(), axis = 0))
100 loops, best of 3: 14.9 ms per loop

答案 2 :(得分:1)

另一种方法是

df.div(df.index.values, axis=0)

示例:

In [7]: df = pd.DataFrame({'a': range(5), 'b': range(1, 6), 'c': range(2, 7)}).set_index('a')

In [8]: df.divide(df.index.values, axis=0)
Out[8]: 
          b         c
a                    
0       inf       inf
1  2.000000  3.000000
2  1.500000  2.000000
3  1.333333  1.666667
4  1.250000  1.500000

答案 3 :(得分:0)

SHIGH / SHIGH.index

df.index提供了一个类似数组的结构,用于存储索引。