Random 503 error from Netlify functions

I have multiple functions on my GatbysJS site, and i am tracking about 8+ of these “Request failed with status code 503” errors a day, however i can not reliably reproduce them…
I have used both Axios and Fetch, but both are still intermittently returning the error.

Has anyone else experienced anything similar, and had any improvements doing something? all suggestions are much appreciated.

some package versions:
“netlify-lambda”: “1.5.1”
“gatsby”: “2.13.41”
“gatsby-plugin-netlify”: “2.1.0”

If you can tell us the x-nf-request-id or a timezone+timestamp+path for a function execution that failed, we can see if there are any additional internal logs from our proxy. I’ve seen 50x errors before that are due to a few reasons such as SSL negotiation failure, DNS lookup failure, etc, and sometimes our logs have usable details on it (and sometimes not). But, happy to check for you if you help me with those details :slight_smile:

Hey fool I am having a really similar issue with my lambda function that is querying a mysql db except it’s a 502.

This is the error message I get back
{errorMessage: "RequestId: 77d237c0-cf7d-46a9-a966-27fc225f2747 Process exited before completing request"}

Do you think you could point me in the right direction on how I should debug this? My setup works fine if I send a simple response back instead of doing a mysql query. This is the post I have about it here

We have a 10 second timeout on functions, could your database take longer than that to answer a query @rmbh4211995 ?

Hey @fool , heres the x-nf-request-id of a function that just returned a 503.

x-nf-request-id: 68b56417-b42a-45c1-8814-4fef28f7b70b-34923387

We are pretty stumped with it, we rely on these functions for placing orders so it’s quite a bad issue for us.

It’s also happening on multiple Netlify functions across out site, not just the one.

Thanks for your help!

That request took 28 seconds to not return an answer, so our infrastructure hung up, as it always will. Your function did not send as much as HTTP response headers in that time window. I’m surprised it took 28 seconds rather than 10, but regardless, you need to make sure your function returns something way faster :slight_smile:

@fool If the function took 10+ seconds, why would it not return a 502?

Can PR environments effect this?
It seems like we are getting them more commonly in them.

This also is not just a single function issue, all of our functions just started responding with these 503 errors intermittently in the past 3 weeks (with no code changes), with none occurring for the previous 6+ months or so we’ve used them.

It’s just got me confused as to why all our functions would start taking 10+ seconds, they are just basic API calls like posting fields Zapier.

You’re function would need to return the 502, but since the connection is no longer there, it can’t. Our system doesn’t have any context on what your function is doing so when the connection closes with no response, it sends a 503.

As to why your functions are taking 10+ seconds to execute, I’m not able to tell without further information. It could be that the API endpoint it was invoking was slow to respond or down intermittently.

@Dennis I can create a 502 by setting a timeout in our function, we are not returning it, it is Netlify generating that 502 response.

Is there anyway you know to reproduce a 503?

Hmm, you said that Netlify is responding with a 503. In any case, how long do you have your timeout set for? You’ll need to make sure it triggers before the 10 second function runtime limit in order for your function to be able to respond with a 502.

Not sure what you mean by reproducing a 503. That happens anytime your function takes more than 10 seconds to send back a response.

Let me know if I misunderstood what you were saying. If so, please provide more detail/context and we’ll go from there.

@Dennis Not sure if i’m following. If this is the case, why do both of these return a 502?
source: https://github.com/fraserisland/tiles/blob/master/lambda
site: https://adoring-ritchie-28b281.netlify.com/

@Dennis I just received a 503 from the code below, there were no logs in Netlify for this function call. It worked after waiting about 30 seconds and trying again.
The x-nf-request-id: 0c33bcd5-de5e-4dc4-94d5-23e458236272-23721295

require('dotenv')
const cors = require('../lib/enableCors')

exports.handler = cors.enableCORS(async (event, context, callback) => {  
  console.log('called signupEmail')

  try {
    callback(null, {
      statusCode: 200,
      body: 'success'
    })
  } catch(err) {
    console.log(err)
    callback(null, {
      statusCode: 500,
      body: 'error'
    })
  }
})

@fraserisland, I’m showing the response timing for that API call was 28002 ms (~28 seconds). This is over the 10 second timeout on the gateway and, because of this, the gateway responded with the 503.

@luke What would cause that function to take 28 seconds to respond, then pass if i wait a few seconds and try again?

Couldn’t say; what does your code do? It is your code not sending anything for 28 seconds that causes this :slight_smile:

I understand that our function logging isn’t awesome, so this can be hard to troubleshoot, but it’s the only path to understand this as we have no way to “peer inside of” a running function, only look (along with you) at the logged output, to see what was written while running. If nothing is written, there is no further investigation that we can do.

@fool here is the code in the enableCors function:

const enableCORS = (endpoint) => (event, context, send) => {
    const modifyResponse = (request, response) => {
      request = request || {};
      request.headers = request.headers || {};
      const requestHeaders = request.headers['access-control-request-headers'];
      response = response || {};
      response.headers = response.headers || {};
      response.statusCode = response.statusCode || 200;
      response.headers['Access-Control-Allow-Origin'] = '*';
      response.headers['Access-Control-Allow-Headers'] = requestHeaders || '';
      response.headers['Access-Control-Allow-Methods'] = 'GET,POST';
      response.body = response.body || '';
      return response;
    };
  
    if (event.httpMethod === 'OPTIONS') {
      send(null, modifyResponse(event));
    }
    else
    {
      endpoint(event, context, (error, response) => {
        send(error, modifyResponse(event, response));
      });
    }
}

You have all the code for our function which returned a 503, if what you’re saying is true, why would it take Netlify 28 seconds to console.log, and set some headers?

@fraserisland, in regards to this:

You have all the code for our function which returned a 503, if what you’re saying is true, why would it take Netlify 28 seconds to console.log, and set some headers?

Thank you for sticking with this. The answer to this question that I think you are correct and what I said is not true. Sorry about my mistake and, again, your diligence about this is both helpful and appreciated. :+1:

We have filed an issue to track this (internal only - no public URL). We’ll update this topic to confirm if/when the issue is resolved and, in the meantime, we’re happy to continue discussing/troubleshooting it here to further that resolution.

We believe this issue is limited to the deploy preview versions of sites at this time. Would you be willing to please confirm if you are seeing the same thing? Do you get the 503s for other types of deploys or only deploy preview versions of the function/site?

Hey @luke, It’s definitely not limited to previews for us, we are receiving them on both previews, and production.

You’re right, the exact words in the issue filed is mainly affects pull requests. So, while the issue can affect production, it is more likely to occur with deploy previews versions.

We’ll update here once we have a resolution for the issue, as Luke mentioned. Thank you for your patience.

1 Like