如何同时每个文件同时从每组链接下载?

时间:2013-10-28 19:51:53

标签: c# winforms parallel-processing

这是代码:

for (int x = 0; x < imagesSatelliteUrls.Count; x++)
{
    if (!imagesSatelliteUrls[x].StartsWith("http://"))
    {
        imagesSatelliteUrls[x] = stringForSatelliteMapUrls + imagesSatelliteUrls[x];
    }

    using (WebClient client = new WebClient())
    {
        if (!imagesSatelliteUrls[x].Contains("href"))
        {
            client.DownloadFile(imagesSatelliteUrls[x],
                                UrlsDir + "SatelliteImage" + counter.ToString("D6"));
        }
    }

    counter++;
}

它将按文件下载文件。 List imagesSatelliteUrls包含260个按组排序的文件链接。

例如:

index[0] "Group 1"
index[1] some link ....
index[2] some link ....
.
.
.
index[34] "Group 2"
index[35] some link ....
index[36] some link ....
.
.
.
.
index[71] "Group 3"

等等有7组。 我希望它从每个组下载第一个文件togeather,意思是下载并行7个文件。第1组中的第一个文件2 3 4 5 6 7 然后,如果其中一个文件在任何组中完成,它将开始从该组下载下一个文件。

所以我会看到每个第二个7个文件下载,每个文件来自另一个组。 一个文件在某个组中完成下载,它应该移动到同一组中的下一个文件并开始下载。

我该怎么办?由于这个client.DownloadFile我现在使用只会按文件下载文件。

试图下载并行:

这是代码:

Parallel.For(0, imagesSatelliteUrls.Count, /*new ParallelOptions { MaxDegreeOfParallelism = 20 },*/ x =>
            {
                if (!imagesSatelliteUrls[x].StartsWith("http://"))
                {
                    imagesSatelliteUrls[x] = stringForSatelliteMapUrls + imagesSatelliteUrls[x];
                }

                using (WebClient client = new WebClient())
                {
                    if (!imagesSatelliteUrls[x].Contains("href"))
                    {
                        client.DownloadFile(imagesSatelliteUrls[x],
                                            UrlsDir + "SatelliteImage" + counter.ToString("D6"));
                    }
                }

                counter++;
            }); // end of Paralle

例外是:

System.Net.WebException was unhandled by user code
  HResult=-2146233079
  Message=An exception occurred during a WebClient request.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at WeatherMaps.ExtractImages.<>c__DisplayClass2.<.ctor>b__0(Int32 x) in d:\C-Sharp\WeatherMaps\WeatherMaps\WeatherMaps\ExtractImages.cs:line 145
       at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
  InnerException: System.IO.IOException
       HResult=-2147024864
       Message=The process cannot access the file 'd:\localpath\Urls\SatelliteImage000000' because it is being used by another process.
       Source=mscorlib
       StackTrace:
            at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
            at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
            at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
            at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       InnerException: 

我也试过这段代码:

for (int i = 0; i < 7; i++)
            {
                Task.Factory.StartNew(() =>
                {
                    // Here you can easily implement your checking algo as you see fit
                    while (counter < imagesSatelliteUrls.Count)
                    {
                        if (!imagesSatelliteUrls[count].StartsWith("http://"))
                        {
                            imagesSatelliteUrls[count] = stringForSatelliteMapUrls + imagesSatelliteUrls[count];
                        }
                        using (WebClient client = new WebClient())
                        {
                            if (!imagesSatelliteUrls[count].Contains("href"))
                            {

                                client.DownloadFile(imagesSatelliteUrls[count], UrlsDir + "SatelliteImage" + counter.ToString("D6"));
                            }
                        }

                        lock (this)
                        {
                            count++;
                            counter++;
                        }
                    }
                });
            }


System.Net.WebException was unhandled by user code
  HResult=-2146233079
  Message=An exception occurred during a WebClient request.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at WeatherMaps.ExtractImages.<>c__DisplayClass4.<.ctor>b__2() in d:\C-Sharp\WeatherMaps\WeatherMaps\WeatherMaps\ExtractImages.cs:line 122
       at System.Threading.Tasks.Task.InnerInvoke()
       at System.Threading.Tasks.Task.Execute()
  InnerException: System.IO.IOException
       HResult=-2147024864
       Message=The process cannot access the file 'd:\localpath\Urls\SatelliteImage000000' because it is being used by another process.
       Source=mscorlib
       StackTrace:
            at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
            at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
            at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
            at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       InnerException: 

2 个答案:

答案 0 :(得分:1)

使用Parallel.For

//for (int x = 0; x < imagesSatelliteUrls.Count; x++)
Parallel.For(0, imagesSatelliteUrls.Count, /*new ParallelOptions { MaxDegreeOfParallelism = 20 },*/ x =>
{
    if (!imagesSatelliteUrls[x].StartsWith("http://"))
    {
        imagesSatelliteUrls[x] = stringForSatelliteMapUrls + imagesSatelliteUrls[x];
    }

    using (WebClient client = new WebClient())
    {
        if (!imagesSatelliteUrls[x].Contains("href"))
        {
            client.DownloadFile(imagesSatelliteUrls[x],
                                UrlsDir + "SatelliteImage" + x.ToString("D6"));
        }
    }

    counter++;
}); // end of Parallel.For

答案 1 :(得分:0)

如果您添加对System.Net.Http.dll的引用并使用HttpClient类,我创建了一个独立的示例,说明如何执行此操作。

// Create a mock list of data
string someImageUrl = "..."; // some test url of an image file
string urlsDirectory = @"C:\Temp"; // some working directory

var urls = new string[7 * 20];

for (int i = 0; i < urls.Length; i += 7)
{
    urls[i] = String.Format("Group {0}", (i / 7) + 1);

    for (int j = 1; j < 7; j++)
    {
        urls[i + j] = someImageUrl;
    }
}


// Download 6 files at a time.
var client = new HttpClient();

for (int i = 0; i < urls.Length; i += 7)
{
    var directoryPath = Directory.CreateDirectory(Path.Combine(urlsDirectory, urls[i])).FullName;

    var tasks = urls.Skip(i + 1).Take(6).Select(url =>
    {
        return client.GetAsync(url);
    }).ToArray();

    Task.WaitAll(tasks);

    for (int j = 0; j < tasks.Length; j++)
    {
        var response = tasks[j].Result;

        using (var fs = new FileStream(Path.Combine(directoryPath, String.Format("Image {0}.jpg", j + 1)), FileMode.OpenOrCreate))
        {
            using (var responseStream = response.Content.ReadAsStreamAsync().Result)
            {
                responseStream.CopyTo(fs);
            }
        }
    }
}

需要注意的重要一点是,我认为你失去了一些WebClient的自动文件名协商。这是值得的,但你可以在我的例子中看到我只是标记了图像“Image 1.jpg”,“I​​mage 2.jpg”等。

从技术上讲,通过HTTP请求文件时,您可以请求包含以下URL的图像:

http://somehost.com/getImage?id=5

在这种情况下,很难说文件名应该是什么。处理此问题的HTTP标准方法是添加名为Content-Disposition的标头,该标头告诉HTTP客户端文件的名称应该是什么。

每个 Web服务器都不会为您提供Content-Disposition标头,因此您需要回退以尝试将上述URL解析为与Windows兼容的文件名。您可以尝试找到一个简单的函数来剥离所有非NTFS兼容字符的URL。但请记住,在这种情况下,你不会得到一个扩展(jpg,gif等)。服务器可能会给你一个Content-Type标题来告诉你MIME类型,比如“image / jpeg”,但是由你决定要给它的扩展名。