Python Pandas - 迭代唯一的列

时间:2018-03-01 08:08:04

标签: python python-3.x pandas loops dataframe

我正在尝试迭代一系列唯一列值,以便在字典中创建带有字典的三个不同键。这是我现在的代码:

import pandas as pd

dataDict = {}
metrics = frontendFrame['METRIC'].unique()

for metric in metrics:
    dataDict[metric] = frontendFrame[frontendFrame['METRIC'] == metric].to_dict('records')

print(dataDict)

这适用于少量数据,但是随着数据量的增加,这可能需要将近一秒钟(!!!!)。

我在pandas中尝试过groupby,它甚至更慢,也有地图,但我不想把东西归还给列表。我怎样才能迭代这个并以更快的方式创建我想要的东西?我使用的是Python 3.6

更新:

输入:

    DATETIME             METRIC  ANOMALY           VALUE
0   2018-02-27 17:30:32  SCORE      2.0                    -1.0
1   2018-02-27 17:30:32  VALUE      NaN                     0.0
2   2018-02-27 17:30:32  INDEX      NaN  6.6613381477499995E-16
3   2018-02-27 17:31:30  SCORE      2.0                    -1.0
4   2018-02-27 17:31:30  VALUE      NaN                     0.0
5   2018-02-27 17:31:30  INDEX      NaN  6.6613381477499995E-16
6   2018-02-27 17:32:30  SCORE      2.0                    -1.0
7   2018-02-27 17:32:30  VALUE      NaN                     0.0
8   2018-02-27 17:32:30  INDEX      NaN  6.6613381477499995E-16

输出:

{
  "INDEX": [
{
  "DATETIME": 1519759710000,
  "METRIC": "INDEX",
  "ANOMALY": null,
  "VALUE": "6.6613381477499995E-16"
},
{
  "DATETIME": 1519759770000,
  "METRIC": "INDEX",
  "ANOMALY": null,
  "VALUE": "6.6613381477499995E-16"
}],
  "SCORE": [
{
  "DATETIME": 1519759710000,
  "METRIC": "SCORE",
  "ANOMALY": 2,
  "VALUE": "-1.0"
},
{
  "DATETIME": 1519759770000,
  "METRIC": "SCORE",
  "ANOMALY": 2,
  "VALUE": "-1.0"
}],
  "VALUE": [
{
  "DATETIME": 1519759710000,
  "METRIC": "VALUE",
  "ANOMALY": null,
  "VALUE": "0.0"
},
{
  "DATETIME": 1519759770000,
  "METRIC": "VALUE",
  "ANOMALY": null,
  "VALUE": "0.0"
}]
}

1 个答案:

答案 0 :(得分:1)

一种可能的解决方案:

a = defaultdict( list )
_ = {x['METRIC']: a[x['METRIC']].append(x) for x in frontendFrame.to_dict('records')}
a = dict(a)
from collections import defaultdict

a = defaultdict( list )
for x in frontendFrame.to_dict('records'):
    a[x['METRIC']].append(x)
a = dict(a)

慢速:

dataDict = frontendFrame.groupby('METRIC').apply(lambda x: x.to_dict('records')).to_dict()