网站管理员工具Api,获得超过1000个抓取错误

时间:2014-12-21 14:32:49

标签: java api google-webmaster-tools

我使用新的网站站长工具API来获取我网站的所有抓取错误(+详细信息)。 Unfort。它只给了我1000但我有10000.有没有办法得到所有这些?

这是我使用的代码:

package main;

import com.google.api.client.googleapis.auth.oauth2.GoogleAuthorizationCodeFlow;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.auth.oauth2.GoogleTokenResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;

import com.google.api.services.webmasters.Webmasters;
import com.google.api.services.webmasters.Webmasters.Urlcrawlerrorssamples;
import com.google.api.services.webmasters.model.SitesListResponse;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSample;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSamplesListResponse;
import com.google.api.services.webmasters.model.WmxSite;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;


public class WebmastersCommandLine {

  private static String CLIENT_ID = "...";
  private static String CLIENT_SECRET = "...";

  private static String REDIRECT_URI = "urn:ietf:wg:oauth:2.0:oob";

  private static String OAUTH_SCOPE = "https://www.googleapis.com/auth/webmasters.readonly";

  private static String PAGE_URL = "...";

  public static void main(String[] args) throws IOException {
    HttpTransport httpTransport = new NetHttpTransport();
    JsonFactory jsonFactory = new JacksonFactory();

    GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
        httpTransport, jsonFactory, CLIENT_ID, CLIENT_SECRET, Arrays.asList(OAUTH_SCOPE))
        .setAccessType("online")
        .setApprovalPrompt("auto").build();

    String url = flow.newAuthorizationUrl().setRedirectUri(REDIRECT_URI).build();
    System.out.println("open URL:");
    System.out.println("  " + url);
    System.out.println("code:");
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    String code = br.readLine();

    GoogleTokenResponse response = flow.newTokenRequest(code).setRedirectUri(REDIRECT_URI).execute();
    GoogleCredential credential = new GoogleCredential().setFromTokenResponse(response);

    // Create a new authorized API client
    Webmasters service = new Webmasters.Builder(httpTransport, jsonFactory, credential)
        .setApplicationName("WebmastersCommandLine")
        .build();

    Webmasters.Urlcrawlerrorssamples.List req2 = service.urlcrawlerrorssamples().list(PAGE_URL, "notFound", "web");

    try
    {
        UrlCrawlErrorsSamplesListResponse urlList = req2.execute();

        System.out.println("start");

        for(UrlCrawlErrorsSample sample : urlList.getUrlCrawlErrorSample())
        {
            Webmasters.Urlcrawlerrorssamples.Get req3 = service.urlcrawlerrorssamples().get(PAGE_URL, sample.getPageUrl(), "notFound", "web");
            UrlCrawlErrorsSample details = req3.execute();

            System.out.println(sample.getPageUrl() + "," + details.getUrlDetails().getLinkedFromUrls());
        }

    }
    catch(IOException e)
    {
        System.out.println("An error occurred: " + e);
    }

    System.out.println("done");
  }

}

然而,这仅给出了1000个错误的列表,但我需要所有10000个错误。有人知道这样做的方法吗?

1 个答案:

答案 0 :(得分:1)

网站管理员工具API URL Crawl Errors Sample method会返回1000个抓取错误的示例。这并不意味着返回一个完整的列表(您可以从服务器日志中编译)。如果您想通过API获得更多样本,您可以做的一件事是mark these errors as fixed并在一天内回来查看。然后,它将从剩余的爬网错误中生成一组样本。

样本的顺序与UI中的顺序相同,因此更重要的样本将是您看到的第一个。这意味着当您继续前进时收益递减,后来的爬网错误与之前的爬行错误相似,或者至少被视为不太重要。原始blog post在优先级方面有更多优势:

  

我们根据众多因素确定这一点,包括是否   或不是,您在网站地图中包含了网址,链接了多少个地方   来自(如果其中任何一个也在您的网站上),以及是否URL   最近从搜索中获得了任何流量。