Question

我具有以下采用以下参数的并发_api_call_and_processing（）方法：

api_call：是对外部网站的HTTP请求，该网站检索并 XLM文档
lst：是api_call所需的整数（id）列表
callback_processing：是仅解析每个XLM的本地方法请求

我使用api_call（）进行了大约500个HTTP请求，在lst中每个ID对应一个然后，如果每个响应都使用本地方法callback_processing（）处理，该方法解析XLM并返回一个元组

def concurrent_api_call_and_processing(api_call=None, callback_processing=None, lst=None, workers=5):
    """
    :param api_call: Function that will be called concurrently. An API call to API_Provider for each entry.
    : param lst: List of finding's ids needed by the API function to call API_Provider endpoint.
    :param callback_processing: Function that will be called after we get the response from the above  API call.
    : param workers: Number of concurrent threads that will be used.
    :return: array of tuples containing the details of each particular finding.
    """

    output = Queue()
    with ThreadPoolExecutor(max_workers=workers) as executor:
        future_to_f_detail = {executor.submit(api_call, id): id for id in lst}
        for future in as_completed(future_to_f_detail):
            try:
                find_details = future.result()
            except Exception as exc:
                print(f"Finding {id} generated and exception: {exc}")
            else:
                f_det = callback_processing(find_details)
                output.put(f_det)
    return output

使用此方法时，我开始注意到一些随机问题（不是正常终止）。

当我使用数组而不是队列（output=[]）时，但是不确定是否可以使用竞争条件，因此我决定重构代码并开始使用Queue（{ {1}}）

我的问题是：

我的代码是否像现在一样没有竞争条件？

注意：我想指出，在Keynote on Concurrency, PyBay 2017的Raymond Hettinger之后，我添加了output=Queue的睡眠方法进行测试，但无法确定我是否确实有比赛条件。

Answer 1

我认为没有足够的信息来确定这一点。

考虑一下，如果您传入一个api_call函数来增加全局变量会发生什么：

count = 0
def api_call_fn():
  global count 
  count += 1

同时执行该命令时，竞争条件将递增count变量。

callback_processing函数也是如此。

为了审核此代码是否符合竞争条件，我们必须查看这两个函数的定义：）

Answer 2

在上述条件下，该代码上没有竞争条件。根据{{3}}，这是怎么回事：

executor.submit（）：返回一个Future对象，表示可调用对象的执行。
as_completed（future_to_f_detail）：返回由future_to_f_detail给出的Future实例的迭代器，该实例在完成时产生期货（完成或取消的期货）。

因此，确实for循环正在消耗迭代器并一一返回 as_completed（）产生的每个未来

因此，除非call_back（）或我们调用的函数引入了某种异步功能（如上面@ dm03514所描述的示例），否则我们只是在for循环之后同步工作

   counter = 0
   with ThreadPoolExecutor(max_workers=workers) as executor:
        future_to_f_detail = {executor.submit(api_call, id): id for id in lst}
        for future in as_completed(future_to_f_detail):
            print(f"Entering the for loop for {counter+1} time") 
            counter +=1
            try:
                find_details = future.result()
            except Exception as exc:
                print(f"Finding {id} generated and exception: {exc}")
            else:
                f_det = callback_processing(find_details)
                output.append(f_det)
    return output

如果我们有一个包含500个ID的数组，并且我们进行了500次致电，而所有致电都会产生未来，在进入try循环之前，我们将以500次打印该消息。

在这种情况下，我们不必为了避免竞争而使用队列。当我们使用Submit时，期货会创建一个延期执行。

一些重要的注意事项和建议：

Ramalho，Luciano，Fluent Python，第17章“与未来并发”。
Beazley，David：Python食谱第12章并发性。 Page 516定义和执行者任务

使用ThreadPoolExecutor时避免竞争状况

2 个答案: