I have an app that has been deployed on Netlify for a few years. The app uses Netlify functions and I have Datadog monitoring a health function that simply returns 200 OK (every 15min). It’s been running with very little alerts for years but recently started failing almost on a daily basis. The uptime of that function over the last month is 97.5% only. I’m posting below both the uptime chart and the last failed check in case it helps anyone understand what I’m facing.
Hi @perry, thanks for the reply and no issue with the delay. The link you sent across is very interesting and I learnt quite a bit going through the thread, so thanks :).
At first I thought it could be linked it, however the errors I’m getting are not 502s, but timeouts. DNS and SSL are always going through, then it times out on the response (over 30s). You can see the screenshot in my previous post, it’s representative of the bulk of the errors I’m getting.
I’m adding a screenshot of the tests that failed recently.
One more thing I just thought about: these are being executed from an AWS server in Singapore, while I think your default function execution is US east coast isn’t it? Could it be a simple connection issue? The weird thing is we pass SSL.
Hi @hrishikesh, I’m getting a TIMEOUT which means I have no response to provide at all. The best I can do (that I know of) is the 2nd screenshot I provided in the initial post. Do note that both DNS and SSL pass before timing out , although note sure what this tells you about that.
We have looked into this issue further for you, and all the timings we see are less than 700 ms, or 0.7 seconds! We do not see any requests nearing the 30 second timeout you referenced. If you are still experiencing issues, can you provide a reproduction of this issue for us to look into further?
Thanks for following up @hillary, and those timings are super useful to me. I realise that the endpoints where this is happening are redirects, so the DNS, SSL might be going well on the Netlify side, then the request is redirected and that’s where the TIMEOUT happens. I’m trying to get the same to happen with the redirect performed on another provider, which would rule out Netlify and basically point to my origin server as the source of the problem.
Ball is in my court now, thanks again for the support here.