Users are being served old static files and old Lambda functions

I’m running the site www.dreamingspanish.com which is an SPA built with React.
After a deploy, I’m getting errors because users continue being served the old version of the static files.

This is happening as late as 5 days after deploying a new version (for now).
I’ve tracked down the issue by comparing the deploy commit of the frontend against the backend’s. This is not due to the user having kept a tab open, since I make sure to reload the page once after I detect a mismatch.

I’m also seeing different versions of the lambda functions responding to the same a user at the same time. So I’m seeing a call to /.netlify/functions/a being handled by a function from version 0.1 and a call to /.netlify/functions/b being handled by version 0.2. This has happened up to 3 days after a new deploy.

This is causing problems because we have to be very careful to make sure that every version of both the frontend and the backend are compatible with previous versions of the other one.

Is this to be expected? Is there anything that can be done?

Could you share a reproduction in terms of a HAR file?

How would I do this? For every new deploy it only happens for a handful of users and I have no way of reproducing it myself unless I happened to get very lucky.

I think I could generate a HAR file programmatically from a service worker. What data would it need to contain? I would only generate the file after the version mismatch has been detected, so I think the only thing I could do at that point is to call a few different API endpoints and verify that indeed they are being handled by different versions of the backend. Would that be any different from what I’m already doing? What are you expecting to see?

We need to see the network requests being made, the HTML page that gets loaded in the actual request and all of those would have a x-nf-request-id header (along with rest of the metadata). We use this data to compare what you describe against our logs.

How can I make my users’ browsers capture the requests for the HTML itself? Would I need to first create a new deploy with a service worker that captures the requests, wait for enough users to access that deploy, and then create an even newer deploy so that the issue is triggered?

Would this work? If this works at all it could show that the old static files are being served. It still won’t tell us anything about the old lambda functions being used, though.

Hi, @PabloRomanH. The HAR file was only a suggestion. All we need to troubleshoot is a single x-nf-request-id HTTP response header for an incorrect request. If you see multiple incorrect requests, please feel free to send more than one x-nf-request-id header.

The key takeaway is this:

  • Our support team must have some way to identify one of the incorrect requests.

Each x-nf-request-id header has a unique value specific to one and only one HTTP response. If you send us the header we can examine what route that HTTP response took through our systems and what deploy id and function version was used for the response.

We have a support guide about that header here:

It explains how to find the header and it also explains what information you can send instead if that header isn’t available for any reason.

Please help us identify any incorrect responses by sending us the associated header value. A HAR file will automatically contain those headers. However, the HAR file isn’t the only way to identify the incorrect responses. It is just one option available.

I got a few. All these were handled by outdated lambda functions from the previous deploy. Each of them was handled by a different lambda function (different endpoint).

01GS34067DYNTHF9K0NCHEX1JH
01GS79E66MCJTWWRQSV1WR4AVZ
01GS779B3Y2B5E1N7360D63V33
01GS73VG58BPBJ9TP8ZQ07N60E

I don’t know if there’s a way I could get these for the static files, though. I assume a service worker will be needed for that.

Hi, @PabloRomanH.

First, I want to summarize what I found and suggest a way you can know for certain which function version is used. This is because all the function calls were to the correct function version and deploy. I see no examples where an incorrect version was used.

So, we need to revisit why you think the wrong function version is used. I’m guessing because the data returned is not what is expected. However, I think the reason for the data not being what is expected is being caused by something else.

However, I also want to help you prove to yourself that the function versions at Netlify are correct (or to prove me wrong and that the are actually incorrect).

To do this, you need some way to log which version is being used. I would recommend the adding a console.log() statement to all functions with some value which is specific to each change to the functions. For example, you could include a timestamp like version: 2023-02-15 03:32:50 UCT for the exact second (or just the exact minute) you are modifying the function. Then make a new deploy with that change.

The function then will log version: 2023-02-15 03:32:50 UCT to the function logs at Netlify each time the function is invoked. If you see the expected version, you know the correct version is deployed. However, if you do not, the we can revisit the function version hypothesis.

Here is the proof that I found the correct functions are called. I checked the x-nf-request-id header values provided and I found the following.

The response with the x-nf-request-id: 01GS34067DYNTHF9K0NCHEX1JH was handled by the function videos for deploy 63db36beda98490009218f3b. The request was processed at 2023-02-12 15:30:51 UTC and that was the correct deploy at the time.

The response with the x-nf-request-id: 01GS73VG58BPBJ9TP8ZQ07N60E was handled by the function user for deploy 63eafcef5a6feb000881bc0e. The request was processed at 2023-02-14 04:45:15 UTC and that was the correct deploy at the time.

The response with the x-nf-request-id: 01GS779B3Y2B5E1N7360D63V33 was handled by the function dayWatchedTime for deploy 63eafcef5a6feb000881bc0e. The request was processed at 2023-02-14 05:45:14 UTC and that was the correct deploy at the time.

The response with the x-nf-request-id: 01GS79E66MCJTWWRQSV1WR4AVZ was handled by the function playlist" for deploy 63eafcef5a6feb000881bc0e. The request was processed at 2023-02-14 06:22:50 UTC and that was the correct deploy at the time.

So, all function versions above were correct.

About this:

I don’t know if there’s a way I could get these for the static files, though. I assume a service worker will be needed for that.

A service worker should not be required. All HTTP responses from Netlify will have that header, even for static files. You can see it in devtools for all HTTP responses. If you make a HAR recording, it will automatically capture those headers.

To summarize this all, I suggest logging a version to the function logs to prove which version is used. If that proves the correct version is used you will need to keep debugging to find why the function isn’t returning the expected data.

I can’t do that since I can’t reproduce the error on my computer. The error is only happening for a handful of users everytime we do a new deploy.

The way I know the old version is being used is that each response of our API contains the ID of the commit to which it belongs. When the project is built, before anything else we save the value of the environment variable COMMIT_REF to a file that is then imported by all lambda functions.
All those requests were handled by functions belonging to the commit a7b6584089734a89452c341605fa4dddb547802d, not the latest one, ee0ba9554deaf0a68f01ad2dde045377b49c6c24.

Also, I’m seeing multiple responses from the backend with the same x-nf-request-id and old dates in the “date” field, around 2 or 3 days before the call was done. Is it possible there’s some kind of caching going on?

This is one of the responses with a date more than one day before the call was actually done:

{ “age”: “0”, “build-token”: “a7b6584089734a89452c341605fa4dddb547802d”, “cache-control”: “no-cache”, “content-type”: “text/plain; charset=utf-8”, “date”: “Sun, 12 Feb 2023 15:30:51 GMT”, “server”: “Netlify”, “strict-transport-security”: “max-age=31536000”, “transfer-encoding”: “chunked”, “vary”: “Accept-Encoding”, “x-nf-request-id”: “01GS34067DYNTHF9K0NCHEX1JH” }

Hi, @PabloRomanH. I can confirm that the response above was sent only once (Netlify does not reuse x-nf-request-id values). It was sent 2023-02-12 15:30:51.245 UTC.

Netlify CDN did not cache that response. If it is being seen more than once, it is being cached elsewhere. My best guess is that it is coming from a local browser cache or some third-party service.

Note, Netlify CDN can cache URLs and, when On-demand Builder functions are used, it can even cache Function responses.

However, when our CDN caches those responses and serves them again two things are always true:

  • our internal access logs include a field called cache which will state if the response was served from the cache or the origin
  • the x-nf-request-id will change for each cached response (even cached responses do not reuse those values - that value is never cached by Netlify)

I can tell from our logging that 01GS34067DYNTHF9K0NCHEX1JH was not a cached response. It was a cache miss and it was served from a new function invocation.

Again, even if this had been an On-demand Builder function (and it was not) and even if it had been a cached function response, the same x-nf-request-id header would not be sent again. Even if it was cached, the x-nf-request-id would change for each and every cached response. The response body would remain the same but that header would be unique for each and every HTTP response.

To summarize, if you are seeing those responses re-used, they are not originating from Netlify. The caching is most likely the local browser but could also be some third-party service not controlled by Netlify.

The user-agent for the x-nf-request-id you shared was this:

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.101 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

It is the (compatible; Googlebot/2.1; +http://www.google.com/bot.html) at the end that jumps out at me. This header can indicate a bot that has the job to create a “card” for a URL (card = a rectangle containing an image and some text that show more information about the URL in question). These cards are used in social media and replace the URL being posted. Often those third-party services will cache the card itself. It could be that this response was scraped by some bot and being cache by that service itself. I have no idea if that is actually happening but it is a possible scenario based on the user-agent.

Again, all I can say for certain is that the caching isn’t happening at Netlify. I believe you that something is caching those responses. However, I can assure you it is not Netlify doing it.

There is one more way to prove if Netlify is caching the response or not. (I know we are not and I want you to be able to see the proof yourself.) The proof is the answer to this question:

  • What is the IP address that responded with the cached response?

The x-nf-request-id header is what we normally use to identify a specific HTTP response (because the header is a unique identifier). However, if you don’t want to use that header, the following information is needed to identify a single HTTP response:

  • the client IP address making the HTTP request
  • the server IP address that responds
  • the full URL being requested
  • the date, time, and timezone when the response occurred

If you send us that information for an incorrect response, we can track it to see if it is Netlify or not.

You can also test yourself by replacing the IP address in the example below with the IP address that responds:

$ curl --compressed -svo /dev/null https://50.18.142.31/  2>&1 | egrep 'subject:'
*  subject: C=US; ST=California; L=San Francisco; O=Netlify, Inc; CN=*.netlify.app

The IP address above is 50.18.142.31 and the SSL certificate is Netlify’s. If you replace that with an IP address which is not Netlify’s you will not see our name. For example:

$ curl --compressed -svo /dev/null https://8.8.8.8/  2>&1 | egrep 'subject:'
*  subject: CN=dns.google

So, once you find the IP address returning incorrect responses, we will know if that is Netlify’s IP address or someone else’s. (Now, I have a list of our IP addresses so I can also check that way. However, the curl method above can be used reliably without access to our IP address lists.)

​Please let us know if there is anything else we can do to assist with troubleshooting this issue.

Please remember that I can’t reproduce this error on my computer, so anything that requires me to run a command or use the developer tools is not a possibility. Everything I try needs to be run on the user’s browser.

I’ve gone through all the errors and seems that everytime there was a mismatch between versions of lambda functions it was googlebot. I imagine it does some kind of caching, so I’m going to ignore those and assume they’re not going to be a problem.

What I’m left with now are browsers that are using days-old versions of the frontend. I check this after reloading the page once, so I know it’s not just that the user kept a tab open since before the update.
Is there any way of checking if this is due to the CDN serving an old version of the frontend or if it’s the browser’s cache? I’ve detected this issue on Chrome, Safari and Firefox.
Do I need to write a web worker to be able to get this information?

Like mentioned above, the only useful information to us that can help investigate is the x-nf-request-d. However, unless you’re hardcoding deploy permalinks in the functions call like:

fetch('deploy-id--site-id.netlify.app/.netlify/functions/function-name')

your function call should always be made to the latest production deploy. For example:

fetch('/.netlify/functions/function-name')

would always point to the latest published deploy, no matter what version of the frontend is being used (again, unless you’re testing this on a deploy-permalink). The above case that I explained seems abnormal (since it would also need correct CORS headers), and uncommon, so I would not think that is happening here (and you also mention you’re using the second option). In that case, the latest deploy should be the one that’s being used.

The only other way to check would be to ask your clients to clear cache from their browsers and check if they are still getting outdated responses.

Hi, @PabloRomanH. I also wanted to chime in again because I can see you are working hard to resolve this issue.

I really don’t think it is our CDN caching incorrectly and I’m worried that focusing on that hypothesis may cause you to not pursue other avenues of investigation which are more likely to provide answers.

In other words, I think you are likely pursuing a false lead with the CDN caching. I don’t think that is the actual cause and, if it is not the cause, all the time spent trying to find a caching issue in the CDN will be time wasted. It will be wasted time because that was never the root cause so all the time investigating that will only prove it was not the cause. It still won’t tell you what is the cause.

If our CDN is caching things incorrectly, no one else is reporting it. It seems very unlikely that just your one site is being cached incorrectly and everyone else’s sites are working normally. If there was some sort of issue with our CDN, other people would be reporting the issue. Because that isn’t happening, my best guess is that something else is at play here.

Also, we (the support team at Netlify) are in the same situation as you are regarding this comment:

Our support team also needs to see the behavior of the end user’s browser to explain this. Just as you don’t have access to that browser to debug, neither does anyone at Netlify. Anyone trying to explain this behavior would ideally examine an impacted browser to do so.

I do hope you are able to find more information to reveal the root cause here. Again, I just wanted to add my input that I don’t think CDN cache is the cause.

I encourage you to try to reproduce the caching issue locally or to get access to an impacted browser to troubleshoot it directly. These are the mostly likely methods (in my opinion) to find the true root cause here.

1 Like

That’s why I’m trying to figure out if browsers will keep the cached versions of static files even after the browser reloads the page.

I keep asking if I could see that from a web worker, so that I don’t have to waste time trying it if it’s not going to be helpful. I’ve asked already 3 times but haven’t got an answer yet.

Our support team’s scope of support doesn’t include answering general coding questions. We were instead focusing on the issues that were specific to Netlify (if our CDN was caching incorrectly which it was not) and we attempted to assist in that way. However, I should have acknowledged the question, even if I wasn’t answering it, and I’m sorry that I did not do so.

This service worker CacheStorage API documentation appears to indicate that doing so is possible:

What isn’t clear is how the information about the local cache state would be reported to you for debugging purposes. However, if you have some way to export that debugging information (perhaps by using a third-party error logging service), service workers are capable of examining the previous states of the files in their caches and reporting that information.