Question

我正试图让柯利刮掉以下页面：https://www56.muenchen.de/termin/index.php?loc=BB。

这是我的代码：

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector(
        colly.IgnoreRobotsTxt(),
        colly.Async(false),
    )

    c.OnHTML("html", func(e *colly.HTMLElement) {
        fmt.Println(e.Text)
    })

    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Something went wrong:", err)
    })

    c.Visit("https://www56.muenchen.de/termin/index.php?loc=BB")

    c.OnScraped(func(r *colly.Response) {
        fmt.Println("Finished")
    })
}

问题在于，在访问网站后，它会加载一些内容。我不确定如何告诉柯利“等待”直到发生这种情况，然后查看结果。

期待一些想法。

Answer 1

因为colly不必在客户端进行操作，所以可以这样做，但是colly does not execute JavaScript-因此没有Ajax。

要模拟浏览器，可以使用selenium或phantomjs，如上面的链接所示。

Ajax加载网站内容后进行Web爬网

1 个答案: