并行运行异步方法8次

时间:2013-02-03 14:51:54

标签: c# .net parallel-processing .net-4.5

如何将以下内容转换为Parallel.ForEach?

public async void getThreadContents(String[] threads)
{
    HttpClient client = new HttpClient();
    List<String> usernames = new List<String>();
    int i = 0;

    foreach (String url in threads)
    {
        i++;
        progressLabel.Text = "Scanning thread " + i.ToString() + "/" + threads.Count<String>();
        HttpResponseMessage response = await client.GetAsync(url);
        String content = await response.Content.ReadAsStringAsync();
        String user;
        Predicate<String> userPredicate;
        foreach (Match match in regex.Matches(content))
        {
            user = match.Groups[1].ToString();
            userPredicate = (String x) => x == user;
            if (usernames.Find(userPredicate) != user)
            {
                usernames.Add(match.Groups[1].ToString());
            }
        }
        progressBar1.PerformStep();
    }
}

我在假设异步和并行处理相同的情况下对其进行编码,而我才意识到它不是。我看了一下我能找到的所有问题,而且我似乎无法找到一个为我做的例子。他们中的大多数缺乏可读的变量名称使用单字母变量名称不能解释它们包含的内容是一种说明示例的可怕方式。

我通常在名为threads的数组中包含300到2000个条目(包含论坛线程的URL),看起来并行处理(由于许多HTTP请求)会加快执行速度。

在使用Parallel.ForEach之前,是否必须删除所有异步(我在foreach之外没有任何异步,只有变量定义)?我该怎么做呢?我可以不阻塞主线程吗?

我顺便使用.NET 4.5。

4 个答案:

答案 0 :(得分:7)

Stephen Toub有一个good blog post on implementing a ForEachAsync。对于可以使用Dataflow的平台,Svick的答案非常好。

这是另一种选择,使用TPL中的分区程序:

public static Task ForEachAsync<T>(this IEnumerable<T> source,
    int degreeOfParallelism, Func<T, Task> body)
{
  var partitions = Partitioner.Create(source).GetPartitions(degreeOfParallelism);
  var tasks = partitions.Select(async partition =>
  {
    using (partition) 
      while (partition.MoveNext()) 
        await body(partition.Current); 
  });
  return Task.WhenAll(tasks);
}

然后您可以这样使用:

public async Task getThreadContentsAsync(String[] threads)
{
  HttpClient client = new HttpClient();
  ConcurrentDictionary<String, object> usernames = new ConcurrentDictionary<String, object>();

  await threads.ForEachAsync(8, async url =>
  {
    HttpResponseMessage response = await client.GetAsync(url);
    String content = await response.Content.ReadAsStringAsync();
    String user;
    foreach (Match match in regex.Matches(content))
    {
      user = match.Groups[1].ToString();
      usernames.TryAdd(user, null);
    }
    progressBar1.PerformStep();
  });
}

答案 1 :(得分:6)

  

我在假设异步和并行处理相同的情况下编码它

异步处理和并行处理是完全不同的。如果您不理解其中的差异,我认为您应该首先阅读更多相关内容(例如what is the relation between Asynchronous and parallel programming in c#?)。

现在,你想要做的事情实际上并不那么简单,因为你想要异步处理一个大集合,具有特定的并行度(8)。使用同步处理,您可以使用Parallel.ForEach()(以及ParallelOptions来配置并行度),但没有简单的替代方法可以使用async

在您的代码中,由于您希望所有内容都在UI线程上执行,因此这很复杂。 (尽管理想情况下,您不应直接从计算中访问UI。而应使用IProgress,这意味着代码不再需要在UI线程上执行。)

在.Net 4.5中执行此操作的最佳方法可能是使用TPL Dataflow。它的ActionBlock完全符合您的要求,但它可能非常冗长(因为它比您需要的更灵活)。因此,创建一个辅助方法是有意义的:

public static Task AsyncParallelForEach<T>(
    IEnumerable<T> source, Func<T, Task> body,
    int maxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
    TaskScheduler scheduler = null)
{
    var options = new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = maxDegreeOfParallelism
    };
    if (scheduler != null)
        options.TaskScheduler = scheduler;

    var block = new ActionBlock<T>(body, options);

    foreach (var item in source)
        block.Post(item);

    block.Complete();
    return block.Completion;
}

在你的情况下,你会像这样使用它:

await AsyncParallelForEach(
    threads, async url => await DownloadUrl(url), 8,
    TaskScheduler.FromCurrentSynchronizationContext());

此处,DownloadUrl()是处理单个网址(循环体)的async Task方法,8是并行度(可能不应该是文字常量)在实际代码中)和FromCurrentSynchronizationContext()确保代码在UI线程上执行。

答案 2 :(得分:2)

另一个替代方案是使用SemaphoreSlimAsyncSemaphoreis included in my AsyncEx library并支持比SemaphoreSlim更多的平台:

public async Task getThreadContentsAsync(String[] threads)
{
  SemaphoreSlim semaphore = new SemaphoreSlim(8);
  HttpClient client = new HttpClient();
  ConcurrentDictionary<String, object> usernames = new ConcurrentDictionary<String, object>();

  await Task.WhenAll(threads.Select(async url =>
  {
    await semaphore.WaitAsync();
    try
    {
      HttpResponseMessage response = await client.GetAsync(url);
      String content = await response.Content.ReadAsStringAsync();
      String user;
      foreach (Match match in regex.Matches(content))
      {
        user = match.Groups[1].ToString();
        usernames.TryAdd(user, null);
      }
      progressBar1.PerformStep();
    }
    finally
    {
      semaphore.Release();
    }
  }));
}

答案 3 :(得分:0)

您可以尝试docs中的ParallelForEachAsync扩展方法:

using System.Collections.Async;

public async void getThreadContents(String[] threads)
{
    HttpClient client = new HttpClient();
    List<String> usernames = new List<String>();
    int i = 0;

    await threads.ParallelForEachAsync(async url =>
    {
        i++;
        progressLabel.Text = "Scanning thread " + i.ToString() + "/" + threads.Count<String>();
        HttpResponseMessage response = await client.GetAsync(url);
        String content = await response.Content.ReadAsStringAsync();
        String user;
        Predicate<String> userPredicate;
        foreach (Match match in regex.Matches(content))
        {
            user = match.Groups[1].ToString();
            userPredicate = (String x) => x == user;
            if (usernames.Find(userPredicate) != user)
            {
                usernames.Add(match.Groups[1].ToString());
            }
        }

        // THIS CALL MUST BE THREAD-SAFE!
        progressBar1.PerformStep();
    },
    maxDegreeOfParallelism: 8);
}
相关问题