通过身份验证从网站自动下载图片

时间:2015-01-26 01:32:27

标签: .net vb.net http cookies httpwebrequest

我的目的是自动下载需要登录的网站中的所有图片(我认为是基于网络表单的登录)

网站:http://www.cgwallpapers.com

登录网址:http://www.cgwallpapers.com/login.php

注册会员网址:http://www.cgwallpapers.com/members

随机壁纸网址只有注册会员可以访问和下载: http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080

知道 viewwallpaper.php 发布数据需要两个参数,壁纸 id (从x到y)和壁纸 res ,我想写一个 FOR 来生成所有组合以自动化壁纸下载。

我尝试的第一件事就是以这种方式使用WebClient:

Dim client As New WebClient()
client.Credentials = New System.Net.NetworkCredential("user", "pass")
client.DownloadFile("http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080", "C:\file.jpg")

但是那不起作用,它会返回html文本内容而不是图像,我认为这是因为我已经阅读过,我需要传递登录cookie。

因此,我已经在 StackOverflow 和其他网站上看到并研究过很多关于如何通过HttpWebRequests登录和下载文件的示例,因为这似乎是正确的方法。

这是我登录网站的方式,我得到了正确的登录cookie(或者我认为是这样)

Dim logincookie As CookieContainer

Dim url As String = "http://www.cgwallpapers.com/login.php"
Dim postData As String = "action=go&emailMyUsername=&wachtwoord=MyPassword"
Dim tempCookies As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)

Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
    .Method = "POST"
    .Host = "www.cgwallpapers.com"
    .Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    .Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
    .Headers.Add("Accept-Encoding: gzip, deflate")
    .ContentType = "application/x-www-form-urlencoded"
    .UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
    .Referer = "http://www.cgwallpapers.com/login.php"
    .KeepAlive = True

    postReq.CookieContainer = tempCookies
    postReq.ContentLength = byteData.Length
End With

Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
    .Write(byteData, 0, byteData.Length)
    .Close()
End With

Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)

tempCookies.Add(postresponse.Cookies)
logincookie = tempCookies

postresponse.Close()
postreqstream.Close()

此时我被卡住了,因为我不确定如何使用获取的登录Cookie下载图片。

我想在获取登录cookie之后我应该使用保存的登录cookie对所需的壁纸网址执行另一个请求,而不是?,但我认为我做错了,下一个代码不起作用, postresponse.ContentLength总是 -1 ,因此我无法写入文件。

Dim url As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?"
Dim postData As String = "id=1764&res=1920x1080"

Dim byteData As Byte() = Encoding.GetBytes(postData)

Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
    .Method = "POST"
    .Host = "www.cgwallpapers.com"
    .Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    .Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
    .Headers.Add("Accept-Encoding: gzip, deflate")
    .ContentType = "application/x-www-form-urlencoded"
    .UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
    .KeepAlive = True
    ' .Referer = ""

    .CookieContainer = logincookie
    .ContentLength = byteData.Length
End With

Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
    .Write(byteData, 0, byteData.Length)
    .Close()
End With

Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)

Dim memStream As MemoryStream
Using rdr As Stream = postresponse.GetResponseStream
    Dim count As Integer = Convert.ToInt32(postresponse.ContentLength)
    Dim buffer As Byte() = New Byte(count) {}
    Dim bytesRead As Integer
    Do
        bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
    Loop Until bytesRead = count
    rdr.Close()
    memStream = New MemoryStream(buffer)
End Using

File.WriteAllBytes("c:\wallpaper.jpg", memStream.ToArray)

如何解决以正确方式下载壁纸的问题?

2 个答案:

答案 0 :(得分:2)

Private Function DownloadImage() As String
    Dim remoteImgPath As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080"
    Dim remoteImgPathUri As New Uri(remoteImgPath)
    Dim remoteImgPathWithoutQuery As String = remoteImgPathUri.GetLeftPart(UriPartial.Path)
    Dim fileName As String = Path.GetFileName(remoteImgPathWithoutQuery)
    Dim localPath As String = Convert.ToString(AppDomain.CurrentDomain.BaseDirectory + "LocalFolder\Images\Originals\") & fileName
    Dim webClient As New WebClient()
    webClient.DownloadFile(remoteImgPath, localPath)
    Return localPath
End Function

我把它扔在一起我觉得它是正确的方向。

尝试

        Dim theFile As String = "c:\wallpaper.jpg"

        Dim fileName As String

        fileName = Path.GetFileName(theFile)



        Dim ms = New MemoryStream(File.ReadAllBytes(theFile))



        Dim dataLengthToRead As Long = ms.Length
        Dim blockSize As Integer = If(dataLengthToRead >= 5000, 5000, CInt(dataLengthToRead))
        Dim buffer As Byte() = New Byte(dataLengthToRead - 1) {}


        Response.Clear()
        Response.ClearContent()
        Response.ClearHeaders()
        Response.BufferOutput = True


        Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName)
        Response.AddHeader("Content-Disposition", "inline; filename=" + fileName)

        Response.AddHeader("Content-Length", blockSize.ToString())
        Response.ContentType = "image/JPEG"



        While dataLengthToRead > 0 AndAlso Response.IsClientConnected
            Dim lengthRead As Int32 = ms.Read(buffer, 0, blockSize)
            Response.OutputStream.Write(buffer, 0, lengthRead)
            Response.Flush()
            dataLengthToRead = dataLengthToRead - lengthRead
        End While




        Response.Flush()
        Response.Close()


    Catch ex As Exception

    End Try

答案 1 :(得分:2)

以下是使用HttpWebRequestHttpWebResponse请求模拟浏览器请求的问题的完整解决方案。我已经对大部分代码进行了评论,希望能让您了解这一切是如何运作的。

必须sUsernamesPassword变量更改为您自己的用户名/密码才能成功登录该网站。

您可能想要更改的可选变量:

  • sDownloadPath:当前设置为与应用程序exe相同的文件夹。将其更改为您要下载图像的路径。
  • sImageResolution:默认为1920x1080,这是您在原始问题中指定的内容。将此值更改为网站上任何可接受的分辨率值。只是警告我并非100%确定所有图像是否具有相同的分辨率,因此如果没有所需分辨率的图像,更改此值可能会导致某些图像被跳过。
  • nMaxErrorsInSuccession:默认设置为10。登录后,应用程序将不断增加图像ID并尝试下载新图像。某些ID不包含图像,这是正常的,因为图像可能已在服务器上删除(或者图像可能无法以所需的分辨率显示)。如果应用程序无法连续下载nMaxErrorsInSuccession次图像,则应用程序将停止,因为我们假设已到达最后一个图像。如果有超过10张图像被删除或在所选分辨率下不可用,则可能需要将此值增加到更高的数字。
  • nCurrentID:默认设置为1。这是网站用于确定向客户端提供的图像的图像ID。下载图像时,每个图像下载尝试nCurrentID变量增加1。根据时间和环境,您可能无法在一个会话中下载所有图像。如果是这种情况,您可以记住您中断的ID,并相应地更新此变量,以便下次启动另一个ID。当您成功下载所有图像并希望稍后运行该应用程序以下载更新的图像时,此功能也很有用。
  • sUserAgent:可以是您想要的任何用户代理。目前使用适用于Windows 7的Firefox 35.0。请注意,某些网站的功能会有所不同,具体取决于您指定的用户代理,因此只有在您确实需要模拟其他浏览器时才更改此功能。

注意:在代码中的各个点策略性地插入了3秒的暂停。一些网站有锤子脚本,可以阻止甚至禁止正在浏览网站的用户。虽然删除这些行会加快下载所有图像所需的时间,但我不建议这样做。

    Imports System.Net
    Imports System.IO

    Public Class Form2
        Const sUsername As String = "USERNAMEHERE"
        Const sPassword As String = "PASSWORDHERE"
        Const sImageResolution As String = "1920x1080"
        Const sUserAgent As String = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
        Const sMainURL As String = "http://www.cgwallpapers.com/"
        Const sCheckLoginURL As String = "http://www.cgwallpapers.com/login.php"
        Const sDownloadURLLeft As String = "http://www.cgwallpapers.com/members/getwallpaper.php?id="
        Const sDownloadURLRight As String = "&res="
        Private oCookieCollection As CookieCollection = Nothing
        Private nMaxErrorsInSuccession As Int32 = 10
        Private nCurrentID As Int32 = 1
        Private sDownloadPath As String = Application.StartupPath

        Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            StartScrape()
        End Sub

        Private Sub StartScrape()
            Try
                Dim bContinue As Boolean = True

                Dim sPostData(5) As String

                sPostData(0) = UrlEncode("action")
                sPostData(1) = UrlEncode("go")
                sPostData(2) = UrlEncode("email")
                sPostData(3) = UrlEncode(sUsername)
                sPostData(4) = UrlEncode("wachtwoord")
                sPostData(5) = UrlEncode(sPassword)

                If GetMethod(sMainURL) = True Then
                    If SetMethod(sCheckLoginURL, sPostData, sMainURL) = True Then
                        ' Login successful

                        Dim nErrorsInSuccession As Int32 = 0

                        Do Until nErrorsInSuccession > nMaxErrorsInSuccession
                            If DownloadImage(sDownloadURLLeft, sDownloadURLRight, sMainURL, nCurrentID) = True Then
                                ' Always reset error count when we successfully download
                                nErrorsInSuccession = 0
                            Else
                                ' Add one to error count because there was no image at the current id
                                nErrorsInSuccession += 1
                            End If

                            nCurrentID += 1
                            Threading.Thread.Sleep(3000)    ' Wait 3 seconds to prevent loading pages too quickly
                        Loop

                        MessageBox.Show("Finished downloading images")
                    End If
                Else
                    MessageBox.Show("Error connecting to main site. Are you connected to the internet?")
                End If
            Catch ex As Exception
                MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End Try
        End Sub

        Private Function GetMethod(ByVal sPage As String) As Boolean
            Dim req As HttpWebRequest
            Dim resp As HttpWebResponse
            Dim stw As StreamReader
            Dim bReturn As Boolean = True

            Try
                req = HttpWebRequest.Create(sPage)
                req.Method = "GET"
                req.AllowAutoRedirect = False
                req.UserAgent = sUserAgent
                req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
                req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
                req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
                req.Headers.Add("Keep-Alive", "300")
                req.KeepAlive = True

                resp = req.GetResponse        ' Get the response from the server 

                If req.HaveResponse Then
                    ' Save the cookie info

                    SaveCookies(resp.Headers("Set-Cookie"))

                    resp = req.GetResponse        ' Get the response from the server 
                    stw = New StreamReader(resp.GetResponseStream)
                    stw.ReadToEnd()    ' Read the response from the server, but we do not save it
                Else
                    MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End If
            Catch exc As WebException
                MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                bReturn = False
            End Try

            Return bReturn
        End Function

        Private Function SetMethod(ByVal sPage As String, ByVal sPostData() As String, sReferer As String) As Boolean
            Dim bReturn As Boolean = False
            Dim req As HttpWebRequest
            Dim resp As HttpWebResponse
            Dim str As StreamWriter
            Dim sPostDataValue As String = ""
            Dim nInitialCookieCount As Int32 = 0

            Try
                req = HttpWebRequest.Create(sPage)
                req.Method = "POST"
                req.UserAgent = sUserAgent
                req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
                req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
                req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
                req.Referer = sReferer
                req.ContentType = "application/x-www-form-urlencoded"
                req.Headers.Add("Keep-Alive", "300")

                If oCookieCollection IsNot Nothing Then
                    ' Pass cookie info from the login page
                    req.CookieContainer = SetCookieContainer(sPage)
                End If

                str = New StreamWriter(req.GetRequestStream)

                If sPostData.Count Mod 2 = 0 Then
                    ' There is an even number of post names and values

                    For i As Int32 = 0 To sPostData.Count - 1 Step 2
                        ' Put the post data together into one string
                        sPostDataValue &= sPostData(i) & "=" & sPostData(i + 1) & "&"
                    Next i

                    sPostDataValue = sPostDataValue.Substring(0, sPostDataValue.Length - 1) ' This will remove the extra "&" at the end that was added from the for loop above

                    ' Post the data to the server

                    str.Write(sPostDataValue)
                    str.Close()

                    ' Get the response

                    nInitialCookieCount = req.CookieContainer.Count
                    resp = req.GetResponse

                    If req.CookieContainer.Count > nInitialCookieCount Then
                        ' Login successful
                        ' Save new login cookies

                        SaveCookies(req.CookieContainer)
                        bReturn = True
                    Else
                        MessageBox.Show("The email or password you entered are incorrect." & vbCrLf & vbCrLf & "Please try again.", "Unable to log in", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
                        bReturn = False
                    End If
                Else
                    ' Did not specify the correct amount of parameters so we cannot continue
                    MessageBox.Show("POST error.  Did not supply the correct amount of post data for " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End If
            Catch ex As Exception
                MessageBox.Show("POST error.  " & ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                bReturn = False
            End Try

            Return bReturn
        End Function

        Private Function DownloadImage(ByVal sPageLeft As String, sPageRight As String, sReferer As String, nCurrentID As Int32) As Boolean
            Dim req As HttpWebRequest
            Dim bReturn As Boolean = False
            Dim sPage As String = sPageLeft & nCurrentID.ToString & sPageRight & sImageResolution

            Try
                req = HttpWebRequest.Create(sPage)
                req.Method = "GET"
                req.AllowAutoRedirect = False
                req.UserAgent = sUserAgent
                req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
                req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
                req.Headers.Add("Accept-Encoding", "gzip, deflate")
                req.Headers.Add("Keep-Alive", "300")
                req.KeepAlive = True

                If oCookieCollection IsNot Nothing Then
                    ' Pass cookie info so that we remain logged in
                    req.CookieContainer = SetCookieContainer(sPage)
                End If

                ' Save file to disk

                Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
                    Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")

                    If sContentDisposition IsNot Nothing Then
                        ' There is an image to download

                        Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim

                        Using responseStream As IO.Stream = oResponse.GetResponseStream
                            Using fs As New IO.FileStream(System.IO.Path.Combine(sDownloadPath, sFilename), FileMode.Create, FileAccess.Write)
                                Dim buffer(2047) As Byte
                                Dim read As Integer

                                Do
                                    read = responseStream.Read(buffer, 0, buffer.Length)
                                    fs.Write(buffer, 0, read)
                                Loop Until read = 0

                                responseStream.Close()
                                fs.Flush()
                                fs.Close()
                            End Using

                            responseStream.Close()
                        End Using

                        bReturn = True
                    End If

                    oResponse.Close()
                End Using
            Catch exc As WebException
                MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                bReturn = False
            End Try

            Return bReturn
        End Function

        Private Function SetCookieContainer(sPage As String) As System.Net.CookieContainer
            Dim oCookieContainerObject As New System.Net.CookieContainer
            Dim oCookie As System.Net.Cookie

            For c As Int32 = 0 To oCookieCollection.Count - 1
                If IsDate(oCookieCollection(c).Value) = False Then
                    oCookie = New System.Net.Cookie
                    oCookie.Name = oCookieCollection(c).Name
                    oCookie.Value = oCookieCollection(c).Value
                    oCookie.Domain = New Uri(sPage).Host
                    oCookie.Secure = False
                    oCookieContainerObject.Add(oCookie)
                End If
            Next

            Return oCookieContainerObject
        End Function

        Private Sub SaveCookies(sCookieString As String)
            ' Convert cookie string to global cookie collection object

            Dim sCookieStrings() As String = sCookieString.Trim.Replace("path=/,", "").Replace("path=/", "").Split(";".ToCharArray())

            oCookieCollection = New CookieCollection

            For Each sCookie As String In sCookieStrings
                If sCookie.Trim <> "" Then
                    Dim sName As String = sCookie.Trim().Split("=".ToCharArray())(0)
                    Dim sValue As String = sCookie.Trim().Split("=".ToCharArray())(1)

                    oCookieCollection.Add(New Cookie(sName, sValue))
                End If
            Next
        End Sub

        Private Sub SaveCookies(oCookieContainer As CookieContainer)
            ' Convert cookie container object to global cookie collection object

            oCookieCollection = New CookieCollection

            For Each oCookie As System.Net.Cookie In oCookieContainer.GetCookies(New Uri(sMainURL))
                oCookieCollection.Add(oCookie)
            Next
        End Sub

        Private Function UrlEncode(ByRef URLText As String) As String
            Dim AscCode As Integer
            Dim EncText As String = ""
            Dim bStr() As Byte = System.Text.Encoding.ASCII.GetBytes(URLText)

            Try
                For i As Long = 0 To UBound(bStr)
                    AscCode = bStr(i)

                    Select Case AscCode
                        Case 48 To 57, 65 To 90, 97 To 122, 46, 95
                            EncText = EncText & Chr(AscCode)

                        Case 32
                            EncText = EncText & "+"

                        Case Else
                            If AscCode < 16 Then
                                EncText = EncText & "%0" & Hex(AscCode)
                            Else
                                EncText = EncText & "%" & Hex(AscCode)
                            End If

                    End Select
                Next i

                Erase bStr
            Catch ex As WebException
                MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End Try

            Return EncText
        End Function
    End Class