Netlify functions uptime issue

Hello,

I have an app that has been deployed on Netlify for a few years. The app uses Netlify functions and I have Datadog monitoring a health function that simply returns 200 OK (every 15min). It’s been running with very little alerts for years but recently started failing almost on a daily basis. The uptime of that function over the last month is 97.5% only. I’m posting below both the uptime chart and the last failed check in case it helps anyone understand what I’m facing.

This happens in both prod and dev (setup as 2 different sites). Name of the site is “smplrspace”.

Any idea what might be happening here?

Thanks in advance for the help.
Thibaut

hi there @tibotiber , thanks for writing in and sorry to be slow to reply.

first off, i am wondering if anything that you are seeing sounds related to what is described here:

could you give that a read through and let us know?

Hi @perry, thanks for the reply and no issue with the delay. The link you sent across is very interesting and I learnt quite a bit going through the thread, so thanks :).

At first I thought it could be linked it, however the errors I’m getting are not 502s, but timeouts. DNS and SSL are always going through, then it times out on the response (over 30s). You can see the screenshot in my previous post, it’s representative of the bulk of the errors I’m getting.

I’m adding a screenshot of the tests that failed recently.

One more thing I just thought about: these are being executed from an AWS server in Singapore, while I think your default function execution is US east coast isn’t it? Could it be a simple connection issue? The weird thing is we pass SSL.

Hey @tibotiber,

I believe you can provide some more information to help us like what exactly is the error that you get? Any status codes? Response headers?

Hi @hrishikesh, I’m getting a TIMEOUT which means I have no response to provide at all. The best I can do (that I know of) is the 2nd screenshot I provided in the initial post. Do note that both DNS and SSL pass before timing out , although note sure what this tells you about that.

Hey there, @tibotiber :wave:

Thanks so much for your patience here.

We have looked into this issue further for you, and all the timings we see are less than 700 ms, or 0.7 seconds! We do not see any requests nearing the 30 second timeout you referenced. If you are still experiencing issues, can you provide a reproduction of this issue for us to look into further?

Thanks for following up @hillary, and those timings are super useful to me. I realise that the endpoints where this is happening are redirects, so the DNS, SSL might be going well on the Netlify side, then the request is redirected and that’s where the TIMEOUT happens. I’m trying to get the same to happen with the redirect performed on another provider, which would rule out Netlify and basically point to my origin server as the source of the problem.

Ball is in my court now, thanks again for the support here.

1 Like

Hey there, @tibotiber :wave:

Happy to help! Thanks for coming back and confirming. I will leave this thread open for you to respond on once you know more!

1 Like