[Support Guide] Understanding and debugging prerendering

Hi, @dcastro, there is a site setting to enable prerendering so it is possible that the setting just needs to be activated for that site. Also, once the setting is changed, a new deploy is needed before it will take effect. (I don’t know if this is the root cause but it is a possibility.)

If this has already been done, please let us know which site this is for and we’ll be happy to take a close look to see why it isn’t working.

Thanks. And yes, the setting is already configured and I’ve disabled / re-enabled and re-deployed with no change.

If you email me direct, I can give you the site that is having issues.

I’ve escalated your case to our helpdesk since you are a Pro customer and could have contacted us there too, where you can tell us your site name so we can look :slight_smile:

Is there a list of UA strings that will trigger prerendering?
I have working OG tags and unfurling works pretty much everywhere but not when sharing via WhatsApp.

Hiya @mrazzari!

Yup. The list changes somewhat frequently, but today it is this CASE INSENSITIVE SUBSTRING REGULAR EXPRESSION:

(baiduspider|twitterbot|facebookexternalhit| facebot|rogerbot|linkedinbot|embedly|quora link preview| showyoubot|SocialFlow|Net::Curl::Simple|Snipcart|Googlebot|outbrain|pinterestbot|pinterest/0|slackbot|vkShare|W3C_Validator|redditbot|Mediapartners-Google|AdsBot-Google|parsely|DuckDuckBot|whatsapp|Hatena|Screaming Frog SEO Spider|bingbot|Sajaribot|DashLinkPreviews|Discordbot|RankSonicBot|lyticsbot|YandexBot/|YandexWebmaster/|naytev-url-scraper|newspicksbot/|Swiftbot/|mattermost|Applebot/|snapchat|viber)

I just tested “whatsapp - 1234” as a user agent and got prerendered content. Have you enabled prerendering on your site, and redeployed after doing so to activate it? It’s in the build & deploy settings page near the bottom :slight_smile:

1 Like

Hi @fool, thanks for that list.

I should have tested that curl myself :slight_smile:

After doing so, and verifying prerender is working for this UA, I realized the problem was Gatsby inlining all CSS in the <head> before the all the <meta> tags. And WhatsApp is only downloading the first N bytes of the HTML response, so it wasn’t even reaching those meta. Quick fix: made it external.

Thanks!

Dang, I hadn’t even considered that as a failure mode! I will update the doc that is the beginning of this thread to mention that. TBQH, that sounds like a broken workflow to me (the way WhatsApp does the partial download, that is) - that’s a very hard to discover gotcha that can slowly impact you without a clear cause.

Thanks so much for following up to let us know that detail, you have already taught one fool a new trick, but hopefully together, we can teach future explorers :slight_smile:

1 Like

Turns out it’s not that uncommon. Just depends on what “N bytes” means for each crawler. Eg for Facebook it’s the first 1MB and the rest gets cutoff. I’m pretty sure others are doing the same, but it’s hard to find conclusive info. Eg. Google’s cutoff is said to be ‘a few hundred MB’ :man_shrugging:

I am learning so much this week - thank you again!

A megabyte (or more) feels pretty unlikely to be a problem (or, put another way, your webpage has a problem if it’s bigger than that in html, even prerendered, before the OG tags) - but smaller N could have wide impact, particularly if N is not documented publicly.

However, I don’t get to tell them how to do business so for now I super appreciate you taking the time to share all these details so others can discover it as well, hopefully with less head scratching!

1 Like

Is there a way to force a rerender of the prerendered site? Let’s say the only thing that I’ve updated is an OpenGraph image. Or is the only solution currently to wait out the 24h (~48h) cache period?

Our staff can flush the cache for you, but it is not likely we’ll see your request in a timely-enough fashion to do so. Unfortunately you didn’t include the URL here so I couldn’t “just do it” when I did happen to see this quickly…

Hello @fool,

It looks like Facebook does not use pre-rendered page.
When we try with Facebook Sharing Debugger it retrieves the default page instead of the pre-rendered version with the meta tags. Pre-render page works for all other crawlers I tested. We are within the limit of 1MB cut off facebook suggests.

By running curl to simulate the Facebook crawler we get the pre-rendered page. The real facebook crawler doesn’t.

Could you tell us a URL that shows this behavior, please?

Thanks for the fast response, here is a link you can test:

OK. We are definitely sending the prerendered content to facebook (you can test:

curl -v -A facebot https://share.100mentors.com

…returns different content than:

curl -v https://share.100mentors.com

So I guess you’ll need to work with Facebook to debug what’s happening if it doesn’t work like you’re expecting, and if you didn’t have any luck following the debug steps above (which will likely help you expose the problem)

The docs state the prerender is cached for 24-48 hours.

Does this happen for every deploy? Or does the cache only get updated once a day or every other day regardless of the latest deploy?

The cache is created on demand - we get a request for a prerendered page not already in cache, we attempt to prerender it and if successful, store it in that 24-48h cache (which is implemented something like: “is this asset in our prerender cache >24 hours old? Then, we will schedule it for removal sometime in the next 24 hours”.)

This cache is handled 100% independent of any deploy - whatever is live at the moment of the request is received & prerendered, it is cached for 24-48h regardless of how many times or changes you may deploy after that.

Hello and thanks for the reply.
We figured out what was happening.
Facebook was crawling the root page because og:url was set to our root page instead of the full url.
Other crawlers that worked probably don’t check the og:url.
Prerendering is working fine!

Are there any plans to support an endpoint like POST /recache?

In some cases, it might be the best way to just set the cache time to infinite (instead of 24 - 48h) and then send a request on POST /recache from the API server when any relevant data have changed. This approach makes sure the cached page is never out of sync and it might be more efficient overall also.

Or is it expected to use prerender.io if this is needed?

That’s an interesting suggestion, but not compatible with our current implementation (since it would break the existing setup’s behavior for tens of thousands of existing sites relying on automatic expiry). I have added the suggestion to our larger feature request around “letting customers manipulate the prerender cache” since it would be a very nice optional feature for folks who are as savvy as you :slight_smile:

1 Like