golang tour webcrawler练习的简单解决方案

时间:2017-03-09 08:22:34

标签: go concurrency channel

我是Go的新手,我看到了一些解决方案,但我觉得它们很复杂......

在我的解决方案中,一切看起来都很简单,但我遇到了死锁错误。我无法弄清楚如何正确关闭通道并停止主程序段内的循环。有一种简单的方法可以做到这一点吗?

Solution on Golang playground

感谢您提供的任何/所有帮助!

package main

import (
    "fmt"
    "sync"
)

type Fetcher interface {
    // Fetch returns the body of URL and
    // a slice of URLs found on that page.
    Fetch(url string) (body string, urls []string, err error)
}

type SafeCache struct {
    cache map[string]bool
    mux   sync.Mutex
}

func (c *SafeCache) Set(s string) {
    c.mux.Lock()
    c.cache[s] = true
    c.mux.Unlock()
}

func (c *SafeCache) Get(s string) bool {
    c.mux.Lock()
    defer c.mux.Unlock()
    return c.cache[s]
}

var (
    sc = SafeCache{cache: make(map[string]bool)}
    errs, ress = make(chan error), make(chan string)
)

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    if depth <= 0 {
        return
    }

    var (
        body string
        err error
        urls []string
    )

    if ok := sc.Get(url); !ok {
        sc.Set(url)
        body, urls, err = fetcher.Fetch(url)
    } else {
        err = fmt.Errorf("Already fetched: %s", url)
    }

    if err != nil {
        errs <- err
        return
    }

    ress <- fmt.Sprintf("found: %s %q\n", url, body)
    for _, u := range urls {
        go Crawl(u, depth-1, fetcher)
    }
    return
}

func main() {
    go Crawl("http://golang.org/", 4, fetcher)
    for {
        select {
        case res, ok := <-ress:
            fmt.Println(res)
            if !ok {
                break
            }
        case err, ok := <-errs:
            fmt.Println(err)
            if !ok {
                break
            }
        }
    }
}

// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
    body string
    urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
    if res, ok := f[url]; ok {
        return res.body, res.urls, nil
    }
    return "", nil, fmt.Errorf("not found: %s", url)
}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
    "http://golang.org/": &fakeResult{
        "The Go Programming Language",
        []string{
            "http://golang.org/pkg/",
            "http://golang.org/cmd/",
        },
    },
    "http://golang.org/pkg/": &fakeResult{
        "Packages",
        []string{
            "http://golang.org/",
            "http://golang.org/cmd/",
            "http://golang.org/pkg/fmt/",
            "http://golang.org/pkg/os/",
        },
    },
    "http://golang.org/pkg/fmt/": &fakeResult{
        "Package fmt",
        []string{
            "http://golang.org/",
            "http://golang.org/pkg/",
        },
    },
    "http://golang.org/pkg/os/": &fakeResult{
        "Package os",
        []string{
            "http://golang.org/",
            "http://golang.org/pkg/",
        },
    },
}

1 个答案:

答案 0 :(得分:1)

您可以使用sync.WaitGroup

解决此问题
  1. 您可以在单独的goroutines中开始收听频道。
  2. WaitGroup将协调你有多少goroutines。
  3. wg.Add(1)说我们要开始新的goroutine。

    wg.Done()说goroutine已经完成了。

    wg.Wait()阻止goroutine,直到所有已开始的goroutines尚未完成。

    这3种方法可以协调goroutines。

    Go playground link

    PS。您可能对SafeCache

    感兴趣sync.RWMutex