Question

我想从一个robots.txt文件中包含此内容的网站上抓取一些数据，这是否意味着我可以从wp-admin的任何地方刮掉？还有其他任何方式，我可以知道该网站允许抓取/爬行没有任何阻止？对于抓取，我使用Python Scrapy Framework。

public static String encrypt(File publicKeyFile, String policy, String inputstr) throws IOException, AbeEncryptionException {
    AbePublicKey publicKey = AbePublicKey.readFromFile(publicKeyFile);
    try (InputStream in = new ByteArrayInputStream(inputstr.getBytes(StandardCharsets.UTF_8);
        ByteArrayOutputStream out = new ByteArrayOutputStream()) {
        encrypt(publicKey, policy, in, out);
        return Base64.getEncoder().encodeToString(out.toByteArray());
    }
}

Answer 1

在较新版本的Scrapy中，引入了新的设置变量robotstxt_obey - 如果启用，将严格遵循机器人txt

默认值为True

正如评论中所述，doc确实说默认值为False，但此行为在最新版本的scrapy中已更改，现在默认为True

robots.txt在这一行中意味着什么？

1 个答案: