我使用新的网站站长工具API来获取我网站的所有抓取错误(+详细信息)。 Unfort。它只给了我1000但我有10000.有没有办法得到所有这些?
这是我使用的代码:
package main;
import com.google.api.client.googleapis.auth.oauth2.GoogleAuthorizationCodeFlow;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.auth.oauth2.GoogleTokenResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.webmasters.Webmasters;
import com.google.api.services.webmasters.Webmasters.Urlcrawlerrorssamples;
import com.google.api.services.webmasters.model.SitesListResponse;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSample;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSamplesListResponse;
import com.google.api.services.webmasters.model.WmxSite;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
public class WebmastersCommandLine {
private static String CLIENT_ID = "...";
private static String CLIENT_SECRET = "...";
private static String REDIRECT_URI = "urn:ietf:wg:oauth:2.0:oob";
private static String OAUTH_SCOPE = "https://www.googleapis.com/auth/webmasters.readonly";
private static String PAGE_URL = "...";
public static void main(String[] args) throws IOException {
HttpTransport httpTransport = new NetHttpTransport();
JsonFactory jsonFactory = new JacksonFactory();
GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
httpTransport, jsonFactory, CLIENT_ID, CLIENT_SECRET, Arrays.asList(OAUTH_SCOPE))
.setAccessType("online")
.setApprovalPrompt("auto").build();
String url = flow.newAuthorizationUrl().setRedirectUri(REDIRECT_URI).build();
System.out.println("open URL:");
System.out.println(" " + url);
System.out.println("code:");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String code = br.readLine();
GoogleTokenResponse response = flow.newTokenRequest(code).setRedirectUri(REDIRECT_URI).execute();
GoogleCredential credential = new GoogleCredential().setFromTokenResponse(response);
// Create a new authorized API client
Webmasters service = new Webmasters.Builder(httpTransport, jsonFactory, credential)
.setApplicationName("WebmastersCommandLine")
.build();
Webmasters.Urlcrawlerrorssamples.List req2 = service.urlcrawlerrorssamples().list(PAGE_URL, "notFound", "web");
try
{
UrlCrawlErrorsSamplesListResponse urlList = req2.execute();
System.out.println("start");
for(UrlCrawlErrorsSample sample : urlList.getUrlCrawlErrorSample())
{
Webmasters.Urlcrawlerrorssamples.Get req3 = service.urlcrawlerrorssamples().get(PAGE_URL, sample.getPageUrl(), "notFound", "web");
UrlCrawlErrorsSample details = req3.execute();
System.out.println(sample.getPageUrl() + "," + details.getUrlDetails().getLinkedFromUrls());
}
}
catch(IOException e)
{
System.out.println("An error occurred: " + e);
}
System.out.println("done");
}
}
然而,这仅给出了1000个错误的列表,但我需要所有10000个错误。有人知道这样做的方法吗?
答案 0 :(得分:1)
网站管理员工具API URL Crawl Errors Sample method会返回1000个抓取错误的示例。这并不意味着返回一个完整的列表(您可以从服务器日志中编译)。如果您想通过API获得更多样本,您可以做的一件事是mark these errors as fixed并在一天内回来查看。然后,它将从剩余的爬网错误中生成一组样本。
样本的顺序与UI中的顺序相同,因此更重要的样本将是您看到的第一个。这意味着当您继续前进时收益递减,后来的爬网错误与之前的爬行错误相似,或者至少被视为不太重要。原始blog post在优先级方面有更多优势:
我们根据众多因素确定这一点,包括是否 或不是,您在网站地图中包含了网址,链接了多少个地方 来自(如果其中任何一个也在您的网站上),以及是否URL 最近从搜索中获得了任何流量。