It’s good news for those who were disappointed that all specifics of crawl errors are being removed from Google Webmaster Tools because they can still be accessed via the GData API. Through this API, it’s possible to availupto 100,000 URLs for each error and a detail of most of the missing information.
A detail about crawl errors is available in two different types of files:
- In India SEO experts download eight CSV files, out of which one will contain a list of all crawl errors.
- Crawl errors feed, which helps us to program and fetch 25 errors at a time.
Various parts of data are available in four ways:
- User interface-based CSV download
- User interface display
- API-based download
- API-based feed
CSV Download:
The eight CSV files which are available through the API for SEO services are:
- Crawl Errors
- Content Errors
- Top Pages
- Top Queries
- Content Keywords
- Internal Links
- External Links
- Social Activity
In India SEO experts download these CSV files either own client library can be built or the PHP client library can be used.
The crawl errors CSV contain the following data:
- For each type of error it contains upto 100,000 URLs
- The full list of URLs which is blocked by robots.txt
- Specifics of “not followed” errors (unlike UI where only status code is returned by the URL)
- Specifics of site-wide server errors
- All specifics of “soft 404s”
A detail about crawl error sources is available through the crawl errors feed, as described below:
Crawl Errors Feed
Though it appears that the crawl errors feed request code is built into the Java and Objective C client libraries, but individual code has to be written to request this if a different library is being used. 25 errors can be fetched at a time through a looping program. The returned information is in the following format:
<atom:entry>
<atom:id>id</atom:id>
<wt:crawl-type>web-crawl</wt:crawl-type>
<wt:issue-type>http-error</wt:issue-type>
<wt:url>http://example.com/dir/</wt:url>
<wt:detail>4xx Error</wt:detail>
<wt:linked-from>http://example.com</wt:linked-from>
<wt:date-detected>2008-11-17T01:06:10.000
</wt:date-detected>
</atom:entry>







Recent Comments