Fix for rerendering for ahrefs

Hi,

My netlify site name is remoteambition

We’ve enabled the prerendering service on Netlify. I have an issue where the bot for ahrefs (an SEO service) is not being recognized as a crawler. This causes it to be served the user-facing single-page app when it requests pages from my site (and messes up any reporting). I saw that you previously enabled this for SemrushBot – can you do the same here?

According to their docs here (AhrefsBot. Learn About the Ahrefs' Web Crawler), they’re the second most active crawler after Googlebot.

They use one of 3 user agent strings…

  • Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)
  • Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/)
  • Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/)

Can you add AhrefsBot and AhrefsSiteAudit to your regexp matcher for crawlers?

Blake

Hi, haven’t gotten a reply in 6 days and I’m wondering if someone from support can help. Thanks.

Hi @nfuser,

We’ve begun the process of adding this UA. We’re now waiting for the developers to confirm and release it into production. We’d let you know once that happens. Thank you for your patience.

This change has rolled out across our CDN

2 Likes

That’s great, thank you for rolling this out.

Hi, am I allowed to disagree with this decision to add Ahref or a way to specifically turn off crawling by them? I do not believe this is the right decision. They are not an SEO service by any stretch of imagination - they crawl sites so that they can sell that data to anybody who wants to compare multiple services. I certainly do not want to allow Ahref to crawl through any of my sites. They are a nuisance.

Hey @zehawki,

Wouldn’t it be possible to include a robots.txt to block that bot?

H, I thought of that - but it depends on Netlify processing flow, yes? Since this is based on UA for prerendering, wouldn’t the prerendered response be returned before checking robots.txt. Happy to be wrong about this…

Yeah, you could be right. I’ll get some clarification on that.

Hey @zehawki,

I asked the developers and according to them, in theory, a bot should check for robots.txt before trying to crawl the site. However, respecting that file or not is up to the bot. At least their docs suggest that they respect that file:

Since robots.txt is not a HTML file, Netlify won’t pre-render it and thus, it should be easily scanned by the bot before crawling the rest of the content.

Ahhh yes, of course. Ahref needs to honor that and its nothing to do with Netlify. My bad. Thanks.

1 Like