透视数据并维护原始排序顺序

时间:2014-01-17 09:52:14

标签: python pandas google-visualization

我想调整django查询集产生的数据,同时在索引列上保持原始(非字母)排序顺序。然后,透视数据将用于Google可视化折线图。

我已经将我自己的代码整合在一起来完成这项工作,但它有点难看,我想知道是否可以使用pandas DataFrame数据集来完成。

我之前从未使用过熊猫,在阅读完doco之后,这就是我想出来的。

这是我的不透明数据框,按日期和期限排序,其中的次序后缀代表:D =日,M =月,Y =年。

df = DataFrame(data)

               date tenor     value
0        2014-01-01    1D  0.517125
1        2014-01-01    1M    0.5175
2        2014-01-01    2M  0.518159
3        2014-01-01    3M    0.5187
4        2014-01-01    4M   0.51912
5        2014-01-01    5M   0.51949
6        2014-01-01    6M    0.5197
7        2014-01-01    9M  0.519511
8        2014-01-01    1Y    0.5198
9        2014-01-01   18M  0.521228
10       2014-01-01    2Y  0.523097
11       2014-01-01    3Y  0.525054
12       2014-01-01    4Y  0.527055
13       2014-01-01    5Y  0.529054
14       2014-01-01    6Y  0.531099
15       2014-01-01    7Y  0.532852
16       2014-01-01    8Y  0.534207
17       2014-01-01    9Y  0.535314
18       2014-01-02    1D  0.517874
19       2014-01-02    1M    0.5181
20       2014-01-02    2M  0.518451
21       2014-01-02    3M    0.5188
22       2014-01-02    4M  0.519113
23       2014-01-02    5M  0.519418
24       2014-01-02    6M    0.5196
25       2014-01-02    9M  0.519377
26       2014-01-02    1Y    0.5197
27       2014-01-02   18M  0.521406
28       2014-01-02    2Y  0.523405
29       2014-01-02    3Y  0.525254
30       2014-01-02    4Y  0.527151
31       2014-01-02    5Y  0.529256
32       2014-01-02    6Y  0.531543
33       2014-01-02    7Y  0.533457
34       2014-01-02    8Y  0.534802
35       2014-01-02    9Y  0.535847
36       2014-01-03    1D  0.518552
37       2014-01-03    1M    0.5186
38       2014-01-03    2M  0.518536
39       2014-01-03    3M    0.5186
40       2014-01-03    4M  0.518865
41       2014-01-03    5M   0.51916
42       2014-01-03    6M    0.5193
43       2014-01-03    9M  0.519024
44       2014-01-03    1Y    0.5193
45       2014-01-03   18M  0.520882
46       2014-01-03    2Y    0.5228
47       2014-01-03    3Y  0.524647
48       2014-01-03    4Y  0.526752
49       2014-01-03    5Y  0.528957
50       2014-01-03    6Y  0.531065
51       2014-01-03    7Y  0.532856
52       2014-01-03    8Y  0.534325
53       2014-01-03    9Y  0.535558

使用pandas pivot会产生以下结果。枢轴工作但行的顺序错误。

df_pivot = df.pivot(index='tenor', columns='date', values='value')
tenor            2014-01-01 2014-01-02 2014-01-03
18M                0.521228   0.521406   0.520882
1D                 0.517125   0.517874   0.518552
1M                   0.5175     0.5181     0.5186
1Y                   0.5198     0.5197     0.5193
2M                 0.518159   0.518451   0.518536
2Y                 0.523097   0.523405     0.5228
3M                   0.5187     0.5188     0.5186
3Y                 0.525054   0.525254   0.524647
4M                  0.51912   0.519113   0.518865
4Y                 0.527055   0.527151   0.526752
5M                  0.51949   0.519418    0.51916
5Y                 0.529054   0.529256   0.528957
6M                   0.5197     0.5196     0.5193
6Y                 0.531099   0.531543   0.531065
7Y                 0.532852   0.533457   0.532856
8Y                 0.534207   0.534802   0.534325
9M                 0.519511   0.519377   0.519024
9Y                 0.535314   0.535847   0.535558

我希望结果按期限列排序:

tenor            2014-01-01 2014-01-02 2014-01-03
1D                 0.517125   0.517874   0.518552
1M                   0.5175     0.5181     0.5186
2M                 0.518159   0.518451   0.518536
3M                   0.5187     0.5188     0.5186
4M                  0.51912   0.519113   0.518865
5M                  0.51949   0.519418    0.51916
6M                   0.5197     0.5196     0.5193
9M                 0.519511   0.519377   0.519024
1Y                   0.5198     0.5197     0.5193
18M                0.521228   0.521406   0.520882
2Y                 0.523097   0.523405     0.5228
3Y                 0.525054   0.525254   0.524647
4Y                 0.527055   0.527151   0.526752
5Y                 0.529054   0.529256   0.528957
6Y                 0.531099   0.531543   0.531065
7Y                 0.532852   0.533457   0.532856
8Y                 0.534207   0.534802   0.534325
9Y                 0.535314   0.535847   0.535558

我已经考虑过编写一个自定义排序函数,在比较然后将其与pandas一起使用时将期限值转换为天数(不确定如何)。

我使用google visualization pivot进行了调查,但这似乎只适用于不在现有DataTable上的查询。

非常感谢任何其他建议。

1 个答案:

答案 0 :(得分:2)

比较日单位与月份单位是模糊的,例如哪个大:30D或1M?如果这没问题,您可以使用reindex()方法重新排序DataFrame:

import pandas as pd

df_pivot = df.pivot(index='tenor', columns='date', values='value')

DayCounts = {"D":1, "M":365.0/12, "Y":365}
index = sorted(df_pivot.index, key=lambda v:int(v[:-1])*DayCounts[v[-1]])

df_pivot.reindex(index)

输出:

date  2014-01-01  2014-01-02  2014-01-03
1D      0.517125    0.517874    0.518552
1M      0.517500    0.518100    0.518600
2M      0.518159    0.518451    0.518536
3M      0.518700    0.518800    0.518600
4M      0.519120    0.519113    0.518865
5M      0.519490    0.519418    0.519160
6M      0.519700    0.519600    0.519300
9M      0.519511    0.519377    0.519024
1Y      0.519800    0.519700    0.519300
18M     0.521228    0.521406    0.520882
2Y      0.523097    0.523405    0.522800
3Y      0.525054    0.525254    0.524647
4Y      0.527055    0.527151    0.526752
5Y      0.529054    0.529256    0.528957
6Y      0.531099    0.531543    0.531065
7Y      0.532852    0.533457    0.532856
8Y      0.534207    0.534802    0.534325
9Y      0.535314    0.535847    0.535558