Synchronous function time limitation: why is it real world time instead of execution time?

From background function doc,

Background Functions allow you to set up serverless function processes that run longer than 10 seconds

It seems to imply a time limit of 10s for synchronous functions. However, it is unclear whether the time limit is on execution time (which means it is safe to await and timeout), or real time (so it will timeout anyway).

I deployed a lambda function to test it out, like this:

import type { Handler, Event, Context, Response } from './lib.netlify.functions'

export const handler = async (event: Event): Promise<Response> => {
  const start = new Date()
  await new Promise<void>(cb => setTimeout(cb, 20 * 1000))
  const end = new Date()
  return { statusCode: 200, body: JSON.stringify({ start, end }) }
}

It turns out an error:

< HTTP/2 502 
< cache-control: no-cache
< date: Thu, 28 Jan 2021 21:42:33 GMT
< content-length: 115
< content-type: text/plain; charset=utf-8
< age: 10
< server: Netlify
< x-nf-request-id:
< 
* Connection #0 to host example.com left intact
{"errorMessage":"2021-01-28T21:42:33.310Z 7fc54fca-fb29-4f2c-90a8-4a9161d27e3b Task timed out after 10.01 seconds"}
* Closing connection 0

So the time limit is on real time.

So here comes the problem. If a function calls another upstream service, its own execution time can be extremely short, however it might await for a long period, and it is not costing anything for AWS. Why cannot AWS makes it a limit on execution time instead ?

Hey @yw662 :wave:t2:

If a function (Function A) calls another function upstream (Function b) and waits for the result, Function A is still sitting in a holding pattern. In another perspective, that means it’s still using real-time AWS resources to maintain the Node.js instance that’s eating CPU cycles doing effectively nothing :stuck_out_tongue: but it’s still very much running — the context and process are both being held so that whenever Function B responds, Function A can continue with the rest of its code.

That is execution time — time the machine spends holding that process running. The function is still very much executing, but you’ve programmed its execution to be waiting on a dependent result.

In the realm of software performance testing, we split “execution time” from “real time” only because modern operating systems are extremely process-parallel and the process you’re actively testing can be put on hold for milliseconds at a time while another process swaps in. In that case, you want “execution time” not “real time” because the machine is spending active time running other processes while yours is on hold. AWS doesn’t really operate this way, but the 10 second cap is indeed on execution time either way.

All that said, here’s what I’d recommend: figure out if you actually need the response value of your upstream function. If you don’t just make the upstream function a Background Function — Background Functions return a 200 immediately and queue the function body to execute asynchronously for up to 15 minutes, but you can’t get the functions official response anywhere (you need to monitor via outside sources).

If you do need the upstream function’s response value, then you’ll need to figure out how to better parallelize your requests or a different architecture for your client/function setup as to better manage the time limit. Lambda Functions (aside from the “Background” variant) aren’t made to be long-running. That’s not what they were built for.

Hope that helps / gives some insight

–
Jon

Sorry I don’t understand. The Node.js instance does eat memory, but how would it be eating CPU when its own event loop is empty ? It won’t eat anything on my local machine and cluster. It runs inside multiple k8s pods and it eats no CPU when idle.

Yes that is why I can’t make it a background.

Maybe my only solution here is to move the whole function upstream, and let the netlify function do a 308.

Sure, from a technical sense you can make the case that it’s not using as many resources, but at the end of the day even when a function is waiting on I/O it’s still using up rented time from AWS Lambda. The only way you stop renting that time is when the base call stack returns. Period. That’s just how Lambda works.

Then you’re going to need to do some other means of resource management and architecture allocations. Happy to help you through this if you can provide some context as to what exactly you’re trying to do with all these functions :slight_smile:

–
Jon

1 Like

Thank you but, I run a neural network service on my upstream, and I use this function as a wrapper to call the service and return what it returns.
The service definitely needs to run more than 10s to give back the result.
The function itself will modify certain fields of request body, check which API to call, and pass an authorization header.

So I cannot think of another way to make it runnable.

Long-running processes kicked off from, and monitored from, web requests are pretty common. I’d recommend this pattern:

If you’re running your own service that you built, perhaps you could add a layer around it that accepts new processing requests and returns a processing-request ID immediately. Then the Function returns that processing-request ID to the client. Client app can check in every 2-5 seconds on the status of the processing-request and once it’s done, grab the actual data. This is a common pattern for long-running processes without taking up web-server time.

1 Like

Thanks for the question, @yw662!

The big difference between regular synchronous Functions and Background Functions is that synchronous functions need to keep an http connection open to receive a response from the function directly to the client. The time limit for synchronous functions refers to the time that this connection remains open. By default, it’s set to 10 seconds, because the vast majority of synchronous functions are well under this, and because the longer this connection is open, the more possibility there is for inconsistencies to occur - a dropped connection, a closed tab, etc.

However, we also understand that it’s not always easy to convert a long-running synchronous call to an asynchronous one. For customers with Pro teams or above, our support team can increase your functions timeout to up to 26 seconds, which is the longest period during which our proxy system can reliably keep the connection open. (Note that while Netlify Functions provide a proxy gateway automatically, if you were doing this with straight AWS, you’d need to set up their API Gateway product. API Gateway’s time limit is similar to ours, at 30 seconds.)

If 26 seconds still isn’t enough, or if you’d like to provide a more consistent user experience within the ~5-25 second range, you can use a polling method like @jonsully suggested, or connect with a pub/sub API like Pusher or Ably.

2 Likes