I am facing an untraceable issue with Netlify function timeouts. The function is called as a webhook from Sanity.io, and is frequently timing out - not in the expected 10s time limit way, but being completely unresponsive and not logging or returning any errors, or seemingly being executed at all.
The site in question is
arthaus-tickets, hosted on
tickets.arthaus.mt. This is a SvelteKit application, with a number of API endpoints exposed (via SvelteKit itself). All of the endpoints are rolled into one
render function, as is standard for any SvelteKit installation. The endpoint I’m mainly having trouble with is hosted under
tickets.arthaus.mt/api/invoice, and is called as a webhook from Sanity.io. Multiple times over the past week or so (and the frequency seems to be increasing), this url is hit, but times out. This is not an issue with a 10s execution limit as far as I can tell - there is nothing logged to the console (the function logs as soon as it is called), there is no 10s timeout error, no error payload is returned, and the connection times out after 30s. Calling the API manually, from my local machine, via Postman, executes the same function perfectly well (logging and all).
A separate endpoint is also handling Stripe webhook events, and I’ve also noticed that these time out occasionally too. It seems that the Stripe retry mechanism may leave longer gaps between attempts, which could explain why the timed out webhook posts eventually succeed, while the Sanity.io ones are either very delayed or fail completely.
This is proving impossible to debug, as there is no information on any logs of any failed runs, and failures are sporadic (so it’s not a misconfigured URL or something like that).
I’m quite at a loss of where to look next.
I get an error saying cannot accept GET requests and I’m not sure how the POST or any other method should look with data. Could you let us know how to reproduce the behaviour?
That’s the issue, it’s super hard to reproduce. The endpoint only accepts POST requests, but when I make a POST from my machine it goes through every time I’ve tried. A fair amount of the webhooks requests also go though, but some seem to be inexplicably swallowed by the void. I lost a few this morning. I’ve actually just checked now the behaviour has repeated itself - a booking made at 16:44UTC is trying to post the webhook payload, but keeps timing out. I’ve attached the last two failed logs from Sanity’s side here:
As you can see, the request is timing out after 30s, but there’s no error returned. The function logs don’t indicate anything (but the same function is called any time the SvelteKit site is accessed so all runs could be general accesses):
When the API endpoint does run, it logs something like this:
Nothing indicates a failed execution, it’s as if the network requests themselves aren’t getting through. Could the IPs be hitting some sort of firewall or rate limiting? The requests should be originating from these IPs:
After manually triggering the same endpoint through with Postman, the function completed and returned a 200 status code in 3.11s.
I manually triggered the webhook for another failed booking, and once I had done that the Sanity webhook for that enpoint succeeded (resulting in a double execution, which is not great).
Maybe Sanity isn’t able to connect to us at all? Do you have any response headers from Sanity dashboard that shows
x-nf-request-id or maybe the IPs Sanity is using?
It seems like it, for the instances that the webhooks fail. The IP addresses are in my previous message. No request headers I’m afraid, and I’m not sure if I’d be able to get them from sanity, but I could try get in touch with their support. Then again, the requests seem to be timing out completely, there is no body on the failed ones.
Can we start by checking for successful connections from the above IPs? If we can only find successful connections from one or two of them, it would indicate that there’s an issue with one of the IPs communicating with Netlify, which would explain why the issue is so sporadic.
I’ve run some more research on my end, and it seems that events from the two 35.x IPs aren’t coming in – I’m not sure if this is a coincidence or part of the problem, given I’ve only been gathering this additional data for a day.
Any updates on this please?
Sorry for the delay. It would appear that Sanity IPs might be getting blocked, but we don’t know why, yet. We’d confirm that with the devs and let you know.
After checking with the devs, we can reliably say that when we see one of those IPs (
126.96.36.199 have no blocks as we also don’t see them at all. There have been no connection attempts to Netlify that have reached Netlify from those IPs.
I’ll follow up with Sanity and see if there are any issues from their end, although they’ve also said that everything’s ok on their end.