502 error just occurred for no apparent reason

I just got some 502 errors from a CloudFront proxy I have in front of my Netlify functions (which themselves are using a redirect in netlify.toml to redirect a more human URL to the .netlify/functions/[my-function] one). It was working just fine before, and is all back to normal now but it was “out” for a couple of minutes. No code updates happened.

There was no deploy happening at the time so I’ve ruled out the Lambda going down during deployment, though I was under the impression Netlify deploys were pretty atomic anyway (maybe the Functions can’t be fully atomic, I’m not sure, but they’d still be pretty close to instantaneous, I’d image).

Is there any way of seeing some kind of logs for this? FWIW going to “Functions” in my app’s dashboard and the clicking on the function to get the logs reveals nothing at all, just “Waiting for logs” (and it’s always that, regardless of redeploying), but I know there’s a separate thread about Netlify Functions’ atrocious logging situation (argh, just found this as well, that explains my exact issue - that’s awful!!), plus I’d only expect console.log stuff to go in there. not access logs, etc.

So yeah, is there any way to determine what was going on at that point?

We do not support being proxy’d to (that is - you can do it, but we can’t help debug). The 502 error will be impossible for us to debug anyway, since it came from Cloudflare’s infrastructure. Perhaps their Support team has some insights, but we won’t ever have any…

You should probably check out this article for some more details on using Cloudflare in front of us - the SSL issue is not yours, but the rest of the caveats apply: [Support Guide] What problems could occur when using Cloudflare in front of Netlify?

We’ll be happy to help troubleshoot non-proxy’d-to functions of course, if you see the problem there!

Btw I use CloudFront not Cloudflare (so it’s all in AWS), but fair enough on still not having access to my CF logs. But FWIW CF itself got a 502 from the call to the function, not that it generated a 502 based on some unknown error.

I do apologize for getting Cloudfront confused with Cloudflare :man_facepalming: . While I don’t anticipate it being anything actionable, I could look into the occurrence of the 502 to see if there was anything obvious in our internal logs (such as a client hangup from Cloudfront) at the time if you have a URL, precise-ish timestamp + timezone.

Our CloudFront logs have two requests at the following times, both UTC:

  • 2019-08-09 12:08:25
  • 2019-08-09 12:09:22

App ID: c2d0d120-97b3-4711-bd41-505cef8d7df3 for a URL prefixed with /-/f

Don’t see anything like that at 12:09 (no 5xx responses on that site from our side except for 12:08 and some client hangups around 12:57 during that hour), but I can see the 12:08 one in our logs. The email address used in the query string was @aol.com, and the user agent was Amazon CloudFront.

As far as I can tell, that was not finished answering within our default 10s timeout, and so we cut it off and called it a 502. I am not sure of this, but since I see the function returning at the 10s mark and that is the default timeout, it seems pretty likely to me.

Thanks for that. So yeah it looks like if we get a 502 it’s likely to be the 10s timeout then? That’s good to know if it’s true (you said you can’t be absolutely certain).

I guess that makes sense, as it is possible they will happen on occasion due to the other third party services my Function calls, sometimes calling other Lambdas that will need to wake up.