Netlify Large Media TTFB

Site name: https://bruggisserpartner-ch.netlify.app/

Hi there!

I am using Large Media for the first time. On thing I discovered is that TTFB is quite bad. I mean 400-900ms bad. Here is an URL of one image:

Here is one request id where I had 900ms TTFB:
x-nf-request-id 01FQAH1352DSB82Z0AKYWQEWZE

What is the reason for this? Is this a temporary issue?

Hi, @luksak. I’m showing that the request for that x-nf-request-id was for a transformed version of 500 KB JPEG image.

When an uncached transformation is requested, the initial image request will be much slower as the image must be transformed before the response can be served.

Now, if there is high traffic for a site and the same image transformation is used for multiple site visitors, subsequent requests for the same transformation should be cached by the CDN node and the requests after the first uncached request will be much faster. This is because the transformation won’t need to be repeated and the cached version will be served instead.

If there are other questions about this, please let us know.

@luke This was not the first request of the image, I did request it a lot to debug performance.

Let me give you a few more slow request ids (those are subsequent requests):

01FQBPAATWW6HJH9AJ1BE9K7PV (666ms)
01FQBPC7QB3D2E5V6YXG4QD4SG (573ms)
01FQBPD0CCWCMWRED0P5FPPBJ2 (516ms)

Do you see faster responses for this URL?

https://61bb63e52eb27189b2439e98–c754ed02-868d-46d6-bd7d-58d31eda596e.netlify.app/img/signaletik-ovaverva-hallenbad-02-2-1.jpg?w=1536&nf_resize=fit

For already transformed images I’d expect response times of maximum 20ms.

As a reference: I have another site on netlify that doesn’t use Large Media. It generates the images during the build process. There I have usually 5ms response time, worst case is 15ms.

Could you elaborate why the response time are that bad? To me it appears that there is no CDN caching at all.

Hi, @luksak. I am showing that all three x-nf-request-ids above were cache misses as well.

Looking at the time of day when these requests were made and the location of the CDN node, this would be a high traffic time of day for the location. It appears the size of the asset and the load on the server is causing the image to drop out of the cache very quickly. If it were being requested with a higher frequency or if the transformed image were smaller, this would result in the asset staying in the cache for longer.

@luke I’d expect a request to be a hit the second time i request that url. And then it should stay in cache for at least an hour, even better a month.

I just ran around 3000 requests to that image using a load testing tool in the past 15 minutes. I still see bad performance (~600ms) requesting that image. In this case I’d expect the CDN to have the asset in cache.

Apart from that I see a three issues with the headers Large Media sets:

  • cache-control: public, max-age=0, must-revalidate leads to no browser caching and bad performance
  • There is no header that lets me debug cache hits/misses
  • There is no header that informs me which CDN location I hit

Which data center am I hitting with the request ids I sent in my last post?

To illustrate the performance impact this I compared two sites with the exactly same stack (Netlify and Nuxt) with one using NLM and one doesn’t:

With NLM:

Without NLM:

Hi, @luksak. The default cache control headers have a purpose which is described in this blog post:

It is true we don’t have a header which shows the cache hits and misses (and there are reasons why we don’t have one).

Regarding being able to see which CDN location is used, yes, there is no header that exposes that. However, that can be determined via the IP address and a geo-ip lookup. For example, using GeoIP2 Databases Demo | MaxMind.

The CDN node for all three x-nf-request-ids shared was one hosted in AWS in the eu-central-1 (Frankfurt) availability zone using the IP address 3.125.252.47. All three used that same node.

Now, I also consider it both unusual and wrong for the repeated requests not to be cached. It is true that they were not being cached but should have been.

So, before I filed a bug report for this, I wanted to make sure I wasn’t missing something. To this end, I was in the process of filing a research request to have our developers take a look at the non-caching behavior. In that research request I was attempting to include instructions to explain to our developers how to see the caching failure.

However, something has changed and I can not longer reproduce the cache misses!

Even with the CDN node which was misbehaving before, the requests are all cache hits now. Note, previously I was able to reproduce the issue with all CDN nodes, not just the Frankfurt one. Now, I cannot reproduce it with any.

I also ran a report for the example URL above (which was https://61bb63e52eb27189b2439e98--c754ed02-868d-46d6-bd7d-58d31eda596e.netlify.app/img/signaletik-ovaverva-hallenbad-02-2-1.jpg?w=1536&nf_resize=fit).

In the last 24 hours, there have been 2341 distinct requests for that specific URL. Of those requests the cache hit and miss counts are as follows:

  • hits: 2297
  • misses: 44

In other words, the URL is being cached now. Are you still seeing the slow loading behavior now?

@luke i realized that i am hijacking my own issue :slight_smile: I created a separate one for the cache-contol header: Improve Netlify Large Media cache-control header

Ok, this information helps a lot. Let’s try to figure this out.

CDN POP location: Frankfurt sounds like the closest one to me. So thats fine.

Cache hits/misses: That ratio is quite good. So that is not the issue. But if a cache hit has a response time of ~600ms, that is quite bad. Please give me an update once you have feedback on your bug report.

FYI: I see very similar response times on your NLM demo: https://netlify-photo-gallery.netlify.app/

Hi, @luksak. There is no bug report because I could no longer reproduce the issue.

If you are still seeing the slow TTFB, would you please send a HAR recording of the issue occurring?

No, the issue isn’t solved.

Alright, I sent you a PM.

What response times do you get on one of the urls I am testing?

Hi, @luksak. I believe I have discovered the reason for the issue. I looked at the requests in the HAR file sent and the request were again uncached on that CDN node. Looking at the URL in HAR file, it has only been requested 30 times in the last 24 hours and no single CDN node has gotten more than 5 requests in that time window.

With a request rate that low, it is very unlikely for the images to be cached in local CDN nodes. There is more information about how the Large Media caching works in the post here:

I’m going to quote/inline some of that post below:

Here’s details about how we cache and charge:

  • Large media has a separate s3 bucket that we cache transforms in for 30 days. (In other words, it’s not affected by caching timeouts or behaviors we do with other pages/assets - you can count on a specific transformation result to be cached for 30 days regardless of our standard CDN node caching status).
  • We also cache on our CDN nodes the same way we handle caching for all other assets . (This also adds an extra layer of performance boost, where if a user is browsing your site on a single CDN node, it won’t have to go to S3 to re-fetch the transformed image.)
  • If a large media transformation result is found in either cache already, then the user is not charged for an additional transformation.
  • Every combination of file (checked by SHA, not just filename, since you may upload a new image with the same filename) and transform parameters requires a unique transformation call. So if you request mypic.jpg?nf_resize=fit&w=300 and then request mypic.jpg?nf_resize=fit&w=301 , we can’t use the first cached image, and have to do a new transformation. Similarly, if you upload a new version of mypic.jpg , and then make the above requests again, we’ll have to do two more transformations.

It is the return to the S3 bucket for cached transformations that is the issue here. That S3 bucket with in the the US and the CDN node fetching from that bucket is in the EU. This is adding additional latency for the requests when the images are not cached in the local (Frankfurt) CDN node.

I have entered a feature request to have the Large Media data be distributed like the CDN nodes are distributed, however, I cannot promise if that feature will become available. In the meantime, the performance you are seeing will continue for any URLs which are not being cached by the CDN.

I know this isn’t the news you were looking for so if there are other questions about this, please reply here anytime.

2 Likes

Hi @luke

Well, since you haven’t changed anything, I am still seeing very bad response times, even after requesting the same file for 1000 times within a minute. I just sent you another har of that.

I think I’ll simply have to move to something else other than Netlify Large Media, since I have other issues with it apart from this one (such as missing webp support).

Hi, @luksak. Again, there is a feature request filed for this but this is the same issue you have already reported. There is nothing new happening here.

I do see you requesting the file 1000 times in just over one minute (71 seconds). I’m showing all 1000 were cache hits.

There were only 12 cache misses for that URL in the 2 hour time window for the URL in the har file. The request in the HAR file was after two minutes of no traffic for that URL and the response size was 631140 bytes (including headers).

The following factors were likely involved from it being removed from the cache:

  • there had been little traffic for the URL before the burst of 1000 requests
  • 100% of the requests (all 1012) were from a single IP address
  • it is a large asset (over 0.5 MB)

All of those factors contributed to the asset dropping out of the cache in the time window between the burst and the HAR file request.

This is also why we don’t expose a header to show what is or isn’t cached. We don’t want people to attempt manipulate the cache with inorganic traffic as you have done. It doesn’t work and we don’t want people wasting time on it for that reason.

If the feature request does become available, we will follow-up here to let you know.

If you want to remove Large Media from this site, please read the following support guide and let us know when you are ready to proceed:

1 Like

Hi @luke ,

Sure, I understand most of your positions on this. But the result is a quite bad performance for end users. That is what I am trying to solve.

I are am not talking about measured performance. I am talking about perceived performance. And that is an issue for me.

Best Luk