Support Forums

Google Search Console: Indexed, though blocked by robots.txt

For a few weeks now I’ve struggled to resolve errors in the Google Search Console.

After every validation I’m still being told that pages are “Indexed, though blocked by robots.txt”.

Yet I can access the robots.txt directly without issue:

And I can’t see any errant headers either.

Ref for one request from my local:
x-nf-request-id: cbe56d8c-213e-49fb-94a3-d827b5c362fc-42946586

Naturally I don’t have the refs for Google requests, but here’s the page crawl data:

Last crawl - 16 Nov 2020, 23:02:34
Crawled as - Googlebot smartphone
Crawl allowed? - Yes
Page fetch - Failed: Crawl anomaly

Search Engines don’t usually work as expected as it takes days, weeks or even months in some cases to actually see the latest data. I don’t think this has anything to do with Netlify. You might get more help regarding this on the Google Search Console forums.

But then wouldn’t that mean that clicking Validate Fix doesn’t actually check if the issue is fixed but checks again if the cached data is still broken? I know Google can do some dumb things, but that one doesn’t make any sense.

Either way I will post on the GSC forums, thanks.

That does make sense to normal people like you, me and others. But, the timespan for which the error still persists is unknown.

I, myself was getting ‘Server errors’ for various pages for about 3 to 4 months while the pages were loading just fine, even with its Live Test tool. So, there’s no definitive conclusion that can be inferred, that’s as far as I know.