如何加快计算速度?

时间:2020-04-02 06:47:26

标签: python pandas runtime

我有一个看起来像这样的数据集:

| ..... userId ................. | ..cahtroomID .... | ... msg_index_in_chat .. | ..time_difference_between_msg .. | | 1234567891222222 | sdfbsjkfdsdklf ... | .............. 1 ........ | ...... 0小时0分钟.................... | | 9876543112252141 | sdfbsjkfdsdklf ... | ...... 2 ................. 0小时4分钟.................... | | 2374623982398939 | quweioqewiieio | ............... 1 ........ | ...... 0小时0分钟.. .................. | | 1234567891222222 | quweioqewiieio | ............... 2 ........ | ...... 0小时4分钟.. .................. | | 2374623982398939 | quweioqewiieio | ............ 3 ........... | ...... 1小时0分钟.. .................. |

我需要计算每个房间中消息之间的平均时间,并将我获得的值分配给每一行。 为此,我这样写:

 df['avg_time'] = 0
    for room in set(df.roomId):
        table = df[['msg_index_in_chat', 'time_difference_between_msg']][df.roomId == room]
        if len(table) > 1:
            avg_time = []
            times = table.time_difference_between_msg.tolist()
            avg_time = sum(times[1:], timedelta(0))/len(times[1:])
        elif len(table) ==1:
            avg_time = timedelta(hours = 0)
        df.loc[df.roomId == room,('avg_time')] = avg_time

问题在于此代码运行了很多时间。 您可以建议一种更快的方法进行计算吗?

谢谢!

1 个答案:

答案 0 :(得分:0)

GroupBy.transform与自定义lambda函数一起使用:

AsyncDocumentClient client = getDBClient();
   RequestOptions options = new RequestOptions();
   options.setPartitionKey(new PartitionKey("143003"));
   client.deleteDocument(String.format("dbs/test-lin/colls/application/docs/%s", document.id()), options);