我有一个看起来像这样的数据集:
| ..... userId ................. | ..cahtroomID .... | ... msg_index_in_chat .. | ..time_difference_between_msg .. | | 1234567891222222 | sdfbsjkfdsdklf ... | .............. 1 ........ | ...... 0小时0分钟.................... | | 9876543112252141 | sdfbsjkfdsdklf ... | ...... 2 ................. 0小时4分钟.................... | | 2374623982398939 | quweioqewiieio | ............... 1 ........ | ...... 0小时0分钟.. .................. | | 1234567891222222 | quweioqewiieio | ............... 2 ........ | ...... 0小时4分钟.. .................. | | 2374623982398939 | quweioqewiieio | ............ 3 ........... | ...... 1小时0分钟.. .................. |
我需要计算每个房间中消息之间的平均时间,并将我获得的值分配给每一行。 为此,我这样写:
df['avg_time'] = 0
for room in set(df.roomId):
table = df[['msg_index_in_chat', 'time_difference_between_msg']][df.roomId == room]
if len(table) > 1:
avg_time = []
times = table.time_difference_between_msg.tolist()
avg_time = sum(times[1:], timedelta(0))/len(times[1:])
elif len(table) ==1:
avg_time = timedelta(hours = 0)
df.loc[df.roomId == room,('avg_time')] = avg_time
问题在于此代码运行了很多时间。 您可以建议一种更快的方法进行计算吗?
谢谢!
答案 0 :(得分:0)
将GroupBy.transform
与自定义lambda函数一起使用:
AsyncDocumentClient client = getDBClient();
RequestOptions options = new RequestOptions();
options.setPartitionKey(new PartitionKey("143003"));
client.deleteDocument(String.format("dbs/test-lin/colls/application/docs/%s", document.id()), options);