我正在尝试抓取一系列在DOM加载完成之前在DOM上运行一堆JavaScript的网站。这意味着我使用的是WebBrowser
而不是友好的WebClient
。我要解决的问题是等到WebBrowser.DocumentCompleted
事件触发后再返回WebBrowser.Document
。然后,我在HtmlDocument
上进行了一些后期处理,但是还无法返回。
let downloadWebSite (address : string) =
let browser = new WebBrowser()
let browserContext = SynchronizationContext()
browser.DocumentCompleted.Add (fun _ ->
printfn "Document Loaded")
async {
do browser.Navigate(address)
let! a = Async.AwaitEvent browser.DocumentCompleted
do! Async.SwitchToContext(browserContext)
return browser.Document)
}
[downloadWebSite "https://www.google.com"]
|> Async.Parallel // there will be more addresses when working
|> Async.RunSynchronously
System.InvalidCastException: Specified cast is not valid.
at System.Windows.Forms.UnsafeNativeMethods.IHTMLDocument2.GetLocation()
at System.Windows.Forms.WebBrowser.get_Document()
at FSI_0058.downloadWebSite@209-41.Invoke(Unit _arg2) in C:\Temp\Untitled-1.fsx:line 209
at Microsoft.FSharp.Control.AsyncPrimitives.CallThenInvokeNoHijackCheck[a,b](AsyncActivation`1 ctxt, FSharpFunc`2 userCode, b result1)
at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Microsoft.FSharp.Control.AsyncResult`1.Commit()
at Microsoft.FSharp.Control.AsyncPrimitives.RunSynchronouslyInAnotherThread[a](CancellationToken token, FSharpAsync`1 computation, FSharpOption`1 timeout)
at Microsoft.FSharp.Control.AsyncPrimitives.RunSynchronously[T](CancellationToken cancellationToken, FSharpAsync`1 computation, FSharpOption`1 timeout)
at Microsoft.FSharp.Control.FSharpAsync.RunSynchronously[T](FSharpAsync`1 computation, FSharpOption`1 timeout, FSharpOption`1 cancellationToken)
at <StartupCode$FSI_0058>.$FSI_0058.main@()
Stopped due to error
有几个问题使我相信我从错误的线程访问WebBrowser
。1 2 3
Async.SwitchToContext(browserContext)
吗? WebBrowser.Document
?答案 0 :(得分:0)
问题出在这一行:
let browserContext = SynchronizationContext()
您手动创建了SynchronizationContext
的新实例,但未将其与UI线程或任何线程关联。这就是为什么当您访问必须在UI线程上访问的browser.Document
时程序崩溃的原因。
要解决此问题,只需使用已经与UI线程关联的现有SynchronizationContext
:
let browserContext = SynchronizationContext.Current
我假设在UI线程上调用了downloadWebSite
函数。如果不是,则可以将上下文从某处传递到函数中,或使用全局变量。
通过Async.SwitchToContext
,可以确保下一行在UI线程中访问并返回文档,但是接收文档的客户端代码可以在非UI线程上运行。更好的设计是使用延续函数。您可以返回SomeType
作为参数传递给downloadWebSite
的延续函数产生的值,而不是直接返回文档。通过这种方式,可以确保继续功能可以在UI线程上运行:
let downloadWebSite (address : string) cont =
let browser = new WebBrowser()
let browserContext = SynchronizationContext.Current
browser.DocumentCompleted.Add (fun _ ->
printfn "Document Loaded")
async {
do browser.Navigate(address)
let! a = Async.AwaitEvent browser.DocumentCompleted
do! Async.SwitchToContext(browserContext)
// the cont function is ensured to be run on UI thread:
return cont browser.Document }
[downloadWebSite "https://www.google.com" (fun document -> (*safely access document*))]
|> Async.Parallel
|> Async.RunSynchronously