Prerendering system is accessing mp3 files

production-splitter.netlify.app
https://www.edityouraudio.com

Hi all

I’ve just experienced the following problem:

my azure storage just got crazy expensive and as I reviewed the logs, I realized most of the access I have to my files was being done by a prerendering engine which user-agent is

Exemple of log
1.0;2021-04-29T23:57:51.2225258Z;GetBlob;AnonymousSuccess;200;219;170;anonymous;;splitter;blob;“https://splitter.blob.core.windows.net:443/uploads/youtube/tZUhPWIloLA/tggyl9ce62l/subtitles.json?1619740670966";"/splitter/uploads/youtube/tZUhPWIloLA/tggyl9ce62l/subtitles.json";d1f9f7b3-901e-0062-7753-3d23d8000000;0;35.171.128.233:60696;2009-09-19;528;0;574;6458;0;;;"0x8D8F6409D171A63”;Saturday, 03-Apr-21 01:34:24 GMT;;“Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/74.0.3729.108 Safari/537.36 Prerender (+https://github.com/prerender/prerender)”;“Edit Your Audio”;

Is this your scraping user-agent ?
If yes, why do you try to access mp3 files for pre-rendering?

Thanks for your help

howdy martinratinaud,

interesting! i have passed this question on to the person who knows most about our prerendering, hopefully we can get some answers in a day on two when he is back. stay tuned.

hi again - just dropping in to let you know we were a bit underwater this week, but we won’t forget about you and are going to get you an answer of some kind soon. thanks for your patience! :pray:

Hi, @martinratinaud. When prerendering is enabled, for the majority of site traffic, nothing changes.

The difference occurs for site traffic where there is a user-agent header sent which we triggers our prerendering service. For those requests, the following occurs:

  • a headless Chrome instance loads the page which includes executing the site javascript
  • the resulting HTML (after running the site javascript) it returned to the client that original requested the page

If the site’s HTML or javascript indicates the MP3 above should be loaded, then that is what will occur.

So, this is not us “scraping” the page. This is the prerendering service responding to incoming HTTP requests with a user-agent which indicates we should be prerendering for it (user-agents like Twitterbot, Googlebot, Bingbot, etc.).

The prerendering service is only triggered in two cases:

  • when a new deploy occurs the prerendering service is used to generate a screenshot of the deploy (regardless of whether prerendering is enabled or not)
  • while prerendering is enabled, an external request is received with a user-agent requiring prerendering

Our prerendering service runs your site’s code and does exactly what it says to do. It behaves no differently than any other web browser would when fetching the page, including any MP3s it loads.
It is a full browser executing the site javascript. It appears that the HTML or javascript of your site loads that file. So, when prerendering is enabled, that is what the prerender does; it loads the MP3 just like any other browser would.

Note, our prerendering service is a fork of prerender.io’s open source project below:

Their best practices page has a possible solution for this. The solution would be to run the following before loading the MP3s to indicate that the page is ready for prerendering:

window.prerenderReady = true;

This would stop the prerendering service and send the HTML as it existed at that time, specifically, a time before the MP3 are loaded. However, this is only a solution is the source of the MP3s being loaded is the site javascript and you have some way of running the code above before this happens.

If there are other questions about this, please let us know.