Crawler4j可以从另一个类运行

时间:2015-01-26 02:22:16

标签: crawler4j

我需要从另一个类调用Crawler4j。我没有使用Controller类中的main方法,而是使用了一个名为setup的简单方法。

class Controller {
public void setup(String seed) {
    try {
        String rootFolder = "data/crawler";
        int numberOfCrawlers = 1;
        CrawlConfig config = new CrawlConfig();
        config.setCrawlStorageFolder(rootFolder);
        config.setPolitenessDelay(300);
        config.setMaxDepthOfCrawling(1);

        PageFetcher pageFetcher = new PageFetcher(config);
        RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
        RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
        CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

        controller.addSeed(seed);
        controller.setCustomData(seed);
        controller.start(MyCrawler.class, numberOfCrawlers);
    } catch(Exception e) {
        e.printStackTrace();
    }
}

}

试图在另一个类中调用它,但是会出错。

Controller c = new Controller();
c.setup(seed);

是否可以在Controller类中没有main方法并仍然运行crawler4j。简而言之,我想知道如何将爬虫集成到已经有主方法的应用程序中。帮助将不胜感激。

2 个答案:

答案 0 :(得分:0)

运行Crawler应该没有问题。下面的代码经过测试,可以像预期的那样工作:

public class Controller {

    public void setup(String seed) {
        try {
            String rootFolder = "data/crawler";
            int numberOfCrawlers = 4;
            CrawlConfig config = new CrawlConfig();
            config.setCrawlStorageFolder(rootFolder);
            config.setPolitenessDelay(300);
            config.setMaxDepthOfCrawling(2);

            PageFetcher pageFetcher = new PageFetcher(config);
            RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
            RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
            CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

            controller.addSeed(seed);
            controller.setCustomData(seed);
            controller.start(BasicCrawler.class, numberOfCrawlers);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws Exception {
        Controller crawler = new Controller();
        crawler.setup("http://www.ics.uci.edu/");
    }
}

答案 1 :(得分:0)

抱歉,我忘记了一个访问修饰符" public"在班级名称之前。因此错误。谢谢你的回答。