Since a while (no exact idea when it started), we are receiving 502 errors on a URL which is being proxied to our backend server. Requests go fine most of the time, but occasionally we receive a 502 response. If I retry the same call after receiving an error, everything works fine again (200 response). I’m not able to reproduce it consistently.
The URL is used to poll for the status of a computation and is called every 2 seconds from our frontend.
Things I have done/checked:
I tested it with curl. No errors there.
All response times are under 1 second (between 100 and 200 ms).
I checked our backend logs and cannot see the request arriving there. In front of our backend server sits an AWS elastic load balancer.
I checked our logs but there doesn’t seem to be any explanation for a 502.
Not all of them have any explanation, but the ones that do have an error message, say this:
Unable to find site
This doesn’t seem to be a message sent by Netlify. Would you be able to check your backend to see if there’s anything additional information that might help?
I can see all successful requests arriving at the backend.
When the 502 occurs, the requests stop coming in and I cannot see the failing request in our logs.
So as far as I can see, the failing request is not reaching our backend.
Maybe something is going wrong at the load balancer level, but I will have to investigate how to troubleshoot that.
Has anything changed on Netlify side in the last few weeks, because this didn’t happen before and as far as I know, we haven’t changed anything in the area that is giving the problems now.
I’ll try to get more info on this in due time. I’m having some issues checking our logs right now, but will follow up with the thread as soon as possible. Sorry to keep you waiting.
Thanks so much for your patience. Are you still seeing this error this week? We ran an investigation and see three instances from the past 30 days, so we want to confirm if the issue persists.
If you are still receiving 502s, please let us know and share the latest log where you see it.
The IDs that you shared now have a completely different error than what was happening before. This is currently being investigated and we’d update this post accordingly.