Hasura Instance is Blocked by Netlify

We are unable to connect from Hasura to call Netlify Functions from Hasura event webhooks since Thursday of last week.

  • Our Netlify site name is hopdrive-io-events
  • Connections from US-East Hasura instance is being blocked for this site since 5pm EST Thursday.
  • Please enable connections from 34.227.3.174

We have confirmed this with Hasura engineers after they were unable to resolve the site from our instance server but we are all able to resolve it from our machines.

As a temporary workaround, I have successfully proved that this is simply a network block by placing a Cloudflare Worker Proxy in the middle and it works to get a response from the Netlify Function.

We also had a ticket about this in our helpdesk. Based on what I can see, there have been no blocks from our end. I can see traffic from that IP in our logs, all getting status 200 as a response (a total of 200k+ requests in the past 7 days). We would need more details from you (or Hasura), to understand why you think something was blocked from our end. What kind of tests were performed to reach this conclusion?

Hasura engineers performed tests from on their cloud instances to try and reach the site and were unable to do so. This began around 5pm on Thursday.

Looking at the past 7 days may be misleading since the issue began Thursday. You might look since Thursday in the logs.

It is also important to know that proxying through a different server fixes the problem. We are not making any changes to the payload in that proxy. The only difference is the source IP to the Netlify function.

Again, this information is not much different from what’s been already shared. Just like Hasura engineers ran tests, even we see us responding with a 200 to that IP. This won’t lead anywhere. We need more details like the kind of tests performed and the results obtained.

Past 7 days also includes Thursday and as you can see:

There has been constant traffic for that. Even on Thursday, we could see about 100 requests an hour from that IP all with status 200.

Okay let me ask a different question… did it start working later or does it still appear to be blocked? I’m, asking because, based on logs for June 29, 2023, 16:30 to 17:30 EST, I can still see successful requests:

We have implemented a workaround through the edge proxy as I mentioned above. Seeing successful calls to the function is not necessarily proof there was not a problem.

Here’s the error message received on the Hasura side.

{
    "data": {
        "message": "{\"message\":\"Connection failure: Network.Socket.connect: <socket: 70>: does not exist (Network is unreachable)\",\"request\":{\"host\":\"events.hopdrive.io\",\"method\":\"POST\",\"path\":\"/.netlify/functions/db-users\",\"port\":443,\"queryString\":\"\",\"requestHeaders\":{\"Content-Type\":\"application/json\",\"User-Agent\":\"hasura-graphql-engine/v2.27.0-cloud.1\",\"X-B3-ParentSpanId\":\"d70c9bb013d299bc\",\"X-B3-Sampled\":\"1\",\"X-B3-SpanId\":\"85a84ce2bac2ac3f\",\"X-B3-TraceId\":\"2be71512e608350206687fa49d70146b\",\"passphrase\":\"EVENT_PASSPHRASE\"},\"responseTimeout\":\"ResponseTimeoutMicro 300000000\",\"secure\":true},\"type\":\"http_exception\"}"
    },
    "type": "client_error",
    "version": "2"
}

Well, if you’re highly determined to prove this was an issue on our end, there’s not much I can say. To investigate an issue, we need to see the problem on our end. I’m sorry I don’t know a better way of troubleshooting. Maybe someone else from my team would know better and they can chime in.

I understand you’re probably used to getting pretty silly requests over these forums and rarely is it ever really a Netlify problem, so I can understand your reluctance to engage and try to help me here.

What can I provide you in order to take this request seriously and to engage with the Netlify network engineers to look at the IP block?

The successful requests you are seeing are not coming from our Hasura instance (34.227.3.174). Again, as stated above, we have implemented a workaround to route traffic through an edge proxy on a Cloudflare Worker.

We are currently unable to connect to Netlify from 34.227.3.174

FYI we are about to run a cleanup job to process missed events since Thursday. With the proxy in place, we know that Netlify will not block the origin (Cloudflare workers instead of Hasura) so this will result in many successful calls in your logs.

I share this to only preempt the anticipated rebuttal that there is no networking problem when in fact our Hasura instance is still being blocked. We are only getting around the block by routing through a proxy.

I look forward to working with you still to get this resolved today before Monday high volumes return on our servers.

To start with, I’m sorry, you’re right. There are some blocks. An explanation why I thought there were none is provided below:

I checked further and while I can still see requests from that IP (to other sites, but I don’t see any request today), I did notice there were no requests from that IP to your site (that I got from the error message you shared above) and the timelines did match. So I checked even further and we are indeed blocking that IP from time to time which is why I’m still able to see the requests.

Here’s what’s happening: One of the sites on Netlify (the same site that I mentioned who wrote into the helpdesk in my initial response) is making a lot of requests. This is resulting in the IP being banned, but we unban the IP after some time, thus some requests ending up successful.

Currently, I’ve reached out to the user to ask what exactly on their end is causing so many requests and if they can control it. I advise you to use a different IP or your current solution works too. As long as your IP ends up being unique than other users using the platform, I think it should be fine. But to be certain, how many users are you expecting when you say “high volumes”?


This is not related to the issue, but I thought it’s important to address it regardless:

I don’t treat any question as silly and I apologise if you got that idea through the conversation. We get several requests about IPs being blocked, and our typical workflow is checking if we’re serving traffic to that IP. If yes, that’s indicative of no block. This is the same thing I did here. Since I could see over 200k requests in the past 7 days, there did not seem like an issue. It’s only when I checked the logs for your specific site, I saw an issue and as I mentioned before, seeing an issue on our end is enough to triggeer a deeper investigation, which I did now.

The only problem I do face quite often is the lack of information shared by the user. Without any indentifying information, it’s not possible for me to check anything specific to their site. As soon as you shared that info, I was able to check further.

Also, there was no reluctance to engage here, but before finding out about the issue, if we were only going to engage in a war of words, it was going to lead to frustration on my end, and and bad customer experience on your end. So, I thought it probably makes sense for someone else to check the issue as I was certainly unable to see it (for reasons explained above) and you seemed to be perfectly sure that there was an issue.

The use case is Hasura Events (database triggers > Webhooks) which means that the calls to the Netlify Functions will always come from the same IP address (from the Hasura instance). Here is some information about the system to help explain how it works: Event Triggers Overview | Hasura GraphQL Docs

Based upon your response, I suspect we will have a problem still even though we are proxying through the Cloudflare workers once volume increases to normal. Also since this is a critical design of our architecture, I am looking at this potential IP blocking as a risk to our systems.

What options do I have to whitelist an origin IP for all of my sites? We would like to do that for our Hasura instances as they should never be blocked from calling our Netlify functions. I would also like to whitelist the Cloudflare Workers we are proxying through right now.

Lastly, can you provide a status update of the IP block? Were you able to remove it?

Thank you, I very much appreciate your thoughtful reply. Communicating over text based forums can be challenging at times to convey information in a good way that is not laden with unintended meaning.

Also, we typically see about 4,000 to 5,000 Hasura events per day. As our business volume increases, so too do the volume of calls to the Netlify Functions to process these events.

While I’m sure the IP ban can be lifted, I don’t think I’m authorised or have the rights to do that. This would most likely need a review from our Reliability team and none of them would be working on Sunday, I’m afraid.

I can probably also add your IP to the allowlist, but again, I’m not sure if I should be making CDN-level changes without consulting anyone.

For your use case, as long as your IP is not being used by others the limits are pretty high for most use cases.

4k to 5k events per day should be fine, and well below the threshhold. The limits are in seconds and multiples of these numbers, so unless you’re expecting to shoot this count by magnitudes, I’m sure you’ll be fine. If you’re still afraid, I can ask the Reliability team to review your workflow and see if something can be arranged to allow your site to work fine, but such a request has not been made before and I’m unsure of the outcome of my ask.

Yes, I agree that our volumes are relatively low (those are totals across all of our sites and functions in each) and thus I am surprised that a IP ban was created. I suspect it was an automated thing to prevent DDoS based on a small burst in a short time period or something.

I understand the reluctance to whitelist so I trust your best judgement regarding that. My main concern is that an automated system will blacklist the Cloudflare Edge origin and then we will be left without anyway to successfully process these Hasura Events triggered from the DB.

If there is anything that can be done now to prevent that from happening Monday, to ensure we do not have an operational impact, then it will buy us some time to review with the Reliability team a more robust solution considering our use case.

As I mentioned, your requests were not the ones that triggered it. Those were someone else’s, who really went over those “high” limits.

Regarding the issue, I’ll PM you with some additional information.

Yes, I see what you mean. Since Hasura is a shared server environment, another tenant on our Hasura server has abused it causing the whole instance to get banned.

I am surprised Hasura Support has not seen more issues raised about this considering that fact. I will bring this up with them.

Has there been any progress on removing this IP block?

The block was automatically removed after some time and it should still be working right now, but that doesn’t say it will continue working forever. All it would take is some other Hasura user to spam Netlify again and it would be temporarily blocked yet another time.

Reliability team is not in favour of the idea to add this IP to an allowlist, especially considering how easy it is to spam Netlify from there.