使用python-eventlet每分钟完成100万个工作

时间:2014-01-07 09:50:03

标签: python eventlet

用例:

  1. 从一台服务器读取数据
  2. 在我的服务器上操作
  3. 将数据发布到其他服务器
  4. 但每分钟需要吞吐量100万。

    更多解释: -

    让我们假设有10000个客户,对于客户,我需要调用5个API并在响应中操作数据,操作后它将创建大约30个API。我想将数据发布到其他服务器。

    (假设:获取API调用数据需要250毫秒的服务器和我发布数据的服务器需要350毫秒才能进行API调用的POST数据。)

    伪代码:

    In every minute For each customers( there are 10000 customers):
    
    
    Fetch data from first_server_for_first_service
    Fetch data from first_server_for_second_service
    Fetch data from first_server_for_third_service
    Fetch data from first_server_for_fourth_service
    Fetch data from first_server_for_fifth_service
    
    Manipulate data of first_service
    Manipulate data of second_service
    Manipulate data of third_service
    Manipulate data of fourth_service
    Manipulate data of fifth_service
    
    post data to second_server_for_first_service_1_type
    post data to second_server_for_first_service_2_type
    post data to second_server_for_first_service_3_type
    post data to second_server_for_first_service_4_type
    post data to second_server_for_first_service_5_type
    post data to second_server_for_first_service_6_type
    post data to second_server_for_second_service_1_type
    post data to second_server_for_second_service_2_type
    post data to second_server_for_second_service_3_type
    post data to second_server_for_second_service_4_type
    post data to second_server_for_second_service_5_type
    post data to second_server_for_second_service_6_type
    post data to second_server_for_third_service_1_type
    post data to second_server_for_third_service_2_type
    post data to second_server_for_third_service_3_type
    post data to second_server_for_third_service_4_type
    post data to second_server_for_third_service_5_type
    post data to second_server_for_third_service_6_type
    post data to second_server_for_fourth_service_1_type
    post data to second_server_for_fourth_service_2_type
    post data to second_server_for_fourth_service_3_type
    post data to second_server_for_fourth_service_4_type
    post data to second_server_for_fourth_service_5_type
    post data to second_server_for_fourth_service_6_type
    post data to second_server_for_fifth_service_1_type
    post data to second_server_for_fifth_service_2_type
    post data to second_server_for_fifth_service_3_type
    post data to second_server_for_fifth_service_4_type
    post data to second_server_for_fifth_service_5_type
    post data to second_server_for_fifth_service_6_type
    

    我们如何通过Eventlet编写代码,以便它可以并行执行如此多的任务。或者eventlet能够执行这么多任务吗?

    请回复。

1 个答案:

答案 0 :(得分:4)

简短回答:这是一个很难的要求。如果你绝对无法减少负载,我强烈建议你看看内置并发支持的快速语言:Go,Haskell,Ocaml。在这种情况下,PyPy也应该有所帮助。

10000 * 35 =每分钟350K API调用。每秒约6K。假设响应时间为350毫秒,则需要~2100个上行和下行服务连接才能跟上。 Eventlet可以容纳这么多的绿色线条。

但是你的CPU存在很大问题。我在旧Core 2 Duo盒上测量的最小的eventlet开销是~25μs。每次通话你只有166μs(1秒/ 6K操作)。祝你在Python中以140μs的速度完成有用的数据处理。好消息是,您应该能够在单独的进程中处理每个1000个客户端,并将CPU负载分散到10个核心。

使用Eventlet解决此任务不需要任何特别有趣的代码。下面的示例代码可能是最简单的方法。您的API调用必须能够重用现有的套接字连接。您可能希望使用队列或信号量添加并发或吞吐量限制。

clients = ['client1', 'client2', ...] # 10K


def service1(request):
    data1 = API.get()
    data2 = process(data1)
    eventlet.spawn(API.post_type_1, data2)
    eventlet.spawn(API.post_type_2, data2)
    # ...


def tick():
    now = time.time()
    for client in clients:
        # some context object
        request = (client, now)

        eventlet.spawn(service1, request)
        eventlet.spawn(service2, request)
        eventlet.spawn(service3, request)
        eventlet.spawn(service4, request)
        eventlet.spawn(service5, request)


def main():
    while True:
        tick()
        eventlet.sleep(60)