Webhooks (deploy notiifcations) being disabled due to non-200s

I have a webhook that is set up to let me know when my site has been successfully deployed.

However, it intermittently is being disabled by Netlify, as:

1 hook has been disabled for failing 6 times in a row
Outgoing hooks are disabled when the service to notify responds with an HTTP error status (4xx/5xx) repeatedly. To re-enable a disabled hook, edit the existing notification.

This would be because the request is being rejected by my service, which requires a Netlify webhook event to be sent, and for it to have a correct signature.

This has happened again today, but I can’t see anything in the Netlify UI as to why this has failed.

Would I be able to get some support from your side to find out what’s gone wrong? The site is www-jvt-me.

Hey @jamietanna,
Apologies for the slow response and thanks for bearing with us! I looked into our logs for your deploys and confirmed that we triggered many webhooks, but unfortunately we don’t have great visibility into webhook requests and responses themselves. I think this is something we’ll have to bring to our backend engineers to see if they know how we might be able to check out the payload that’s sent to your service, if the failures were network issues (unlikely if this is happening often), or if they have any other insights.

I do see that you deploy to production several times a day, sometimes in quick succession, and I wonder if our hooks are tripping over themselves.

One follow-up question for you: is this happening every day, or a few times a week in batches?

Hi Jen,

That’s a shame there’s not much visibility on your side - it’d be great if similar to other platforms (i.e. GitLab) a user is able to inspect any previous webhook request/responses, so they can look into why things have gone wrong.

It appears to be happening several times a week, and depending on how many builds on my site occur, sometimes multiple times in a day.

I’ve added some logging my side, and I am seeing a few cases where I return a 302 (which I used to indicate that a POST to my endpoint was rejected, but without wanting to trigger Netlify’s error cases) - as the above error mentioned it will only be disabled if there are 4xx/5xx response codes, I expected this was fine, but it could be that this is not working?

I’ve also seen the following being received multiple times (which seems to roughly correlate with disabling of the webhook), can you double check if this is anything that may be on your side?

{"action":"coreui_Component","data":[{"filter":[{"property":"repositoryName","value":"*"},{"property":"expression","value":"1==1"},{"property":"type","value":"jexl"}],"limit":50,"page":1,"sort":[{"direction":"ASC","property":"name"}],"start":0}],"method":"previewAssets","tid":18,"type":"rpc"}

I’ll continue to monitor my logs for anything odd.

Hey @jamietanna, just a quick note that Jen is out on PTO this week, but I will make sure they see this and pick it up with you when they are back. thanks for your patience.

Hi @jamietanna,

Do you happen to know how long your endpoint takes to respond to our hook? Another possibility is that when there are multiple hooks that hit your endpoint, the response take greater than 28~ seconds, which will cause our system to close the connection with a 503. Let me know if that is something you observe and like @jen said, we’ll have to investigate further from there.

Thanks!

Hi @Dennis, apologies for the late reply.

Unfortunately I don’t. I’ve added in some extra logging my side, and I do see myself returning HTTP 302s to http.cat/400, which was my way of avoiding Netlify’s Webhook failing on non-2xx responses, but it appears that it may be failing if it’s a 302, too?

These 302s are error cases, so I’m gong to have a look at the request/response and see if there’s a bug my side, but if there’s anything in the meantime you can track / make visible on your side, that’d be super appreciated

Our webhooks don’t follow redirects. Please make sure the destination is accepting POSTs directly, and returning a 2xx response code within 28 seconds.

Interesting, thanks @fool, I’ll give that a go.

If that’s true, is it possible to make sure that the documentation makes that clear? As per the original message:

Outgoing hooks are disabled when the service to notify responds with an HTTP error status (4xx/5xx) repeatedly.

Is not correct, as this would also mean that 3xxs count to disabling the webhook.

No. 3xx’s just don’t work as you intended in that we delivered the POST to someone who wasn’t listening - they don’t however count as failures from our PoV, and don’t lead to disabling.

Since making the change to use non-3xx responses, I appear to have not received any more webhooks being disabled, but I’ve not pushed as many commits to my site for now, so I’m going to continue monitoring this and will keep you posted!

1 Like