Build intermittently fails to prepare repo (timeout, exit status 128)

Hello,

I’m having an issue at the beginning of the build process, during the stage ‘preparing repo’, where the connection to our self-managed GitLab instance times out sometimes.

I insist on the “sometimes” because some builds do pass, but I cannot tell what’s different, so that sometimes it works and sometimes it doesn’t.

Netlify site name: bcul-toolbox

Log of a build where the issue happens:

1:09:46 PM: build-image version: a2d22d22e4555d1ef0a972ed14a0a4b366ad20c4 (focal)
1:09:46 PM: build-image tag: v4.16.3
1:09:46 PM: buildbot version: 9a6b4d0d37eb2a90e2c482e1d6cfe9a0793e6262
1:09:46 PM: Fetching cached dependencies
1:09:46 PM: Starting to download cache of 115.6MB
1:09:47 PM: Finished downloading cache in 1.063823507s
1:09:47 PM: Starting to extract cache
1:09:49 PM: Finished extracting cache in 1.674892752s
1:09:49 PM: Finished fetching cache in 2.794568427s
1:09:49 PM: Starting to prepare the repo for build
1:12:00 PM: User git error while checking for ref refs/heads/feat/label-range-search
1:12:00 PM: Failing build: Failed to prepare repo
1:12:00 PM: Failed during stage 'preparing repo': error checking for ref: : ssh: connect to host gitlab.liip.ch port 22: Connection timed out

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
: exit status 128
1:12:00 PM: Finished processing build request in 2m13.939729416s

Build settings:

What we already tried

Our first guess was that the issue comes from our self-managed GitLab instance. We asked our service provider to look into it, but they didn’t find any failed attempts in the logs.

Here’s their analysis:

We investigated the issue and it seems that connections to the gitlab server weren’t established.

IP tables should not block the connections via SSH:

1287K 80M ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 multiport dports 22 /* 010 ssh accept tcp/v4 */

The connection attempt wasn’t logged on the gitlab server - at least on the 04.01 were you reported the ticket, assuming that you attempted to connection on that day. Failed attempts should be visible in the gitlab-shell logs or sshd logs, which also didn’t show any results.

This can point out that the connection from Netlify to the gitlab server was not established properly, but only judging the issue without any hints like IP address or date/time from attempts.

As you mentioned it is possible that Netlify didn’t accept the Gitlab host key, but it’s hard to tell without logs that provides such information.

We also checked the topic Frequently encountered problems during builds.

Enabling the debug mode didn’t provide additional information during the ‘preparing repo’ stage. Only when it goes beyond this step, we can see that there are more logs.

And as the build does sometimes pass, I understand that the permission to access the repository is set correctly. This rules out the most common root cause for the error 128.

Please let me know if there’s any additional information that might be useful to figure this out.

Thank you in advance for your help.

I don’t think Self-hosted Git is even supported on Starter plan. It seems to be a Business plan feature.

@hrishikesh Indeed, setting up a new site hosted on a self-managed GitLab instance is not possible from the Netlify UI.

Therefore, we went for a manual setup using the CLI, with netlify init --manual, so that the repo is considered as a custom Git provider. Going through the steps of the CLI command, it all went well and the deploy key as well as the webhook have been set up properly. Indeed, when a commit is pushed to the repo, the build is triggered on Netlify (webhook works). And sometimes, the build goes through successfully (deploy key is set up correctly).

What’s puzzling is why does it often timeout when trying to reach our Git remote (~3 out of 4 attempts fail)?

Would you happen to have more logs that could provide us with additional information?

Hi @david.s :wave:t6: , we don’t have more logs to provide. If your build is timing out you may need a build timeout increase. Also please make sure we have the permission to clone the repository you are trying to deploy**. The usual cause for the 128 error is that someone made some changes to settings or repository, some time after linking the repository to your site

Hi @SamO! Thanks for your reply.

If your build is timing out you may need a build timeout increase.

The part that times out is the SSH connection to the git repo. From the timestamp, we can see that the timeout is at about 2 minutes:

I understand that the Netlify build timeout is at 15 minutes, so increasing that value shouldn’t change anything, right? Moreover, when the build succeeds, then it takes only about 2 seconds to get to the next step.

Also please make sure we have the permission to clone the repository you are trying to deploy

We can see in the screenshot below that the build does sometimes succeed, without me changing anything in the settings. Do you think it could still be a permission issue?

hi @david.s

I took a look at this and it seems we try to reach out to the remote server, as it refers on the build logs:

6:04:28 AM: Failed during stage 'preparing repo': error checking for ref: : ssh: connect to host gitlab.liip.ch port 22: Connection timed out

This is pointing out to be some temporary unavailability by the remote server. Is this remote gitlab instance protected or behind a firewall?

Probably not since I can access from my own laptop:

❯ nc -zv gitlab.liip.ch 22
Connection to gitlab.liip.ch port 22 [tcp/ssh] succeeded!

There’s not much we can help here, can you try to add a test site that builds at the exact same time on a different repo somehow? different sites build from different clusters (definitely from different docker containers) so that may be worth exploring.

Thanks @gualter

Indeed, the GitLab instance is not protected. I’ll still check with our infra provider if there could be something else that could be partially blocking the requests.

We do have other sites linked to other repos on that same GitLab instance, and we witness the same behaviour, with some deployment going through, and some timing out when trying to SSH to GitLab.

The other sites in question:

  • pefo-website
  • vp2050

Would it be possible to know what is the command that is used by Netlify at the stage “preparing repo” so that we can try to reproduce the issue on our end?

I found that there isn’t a specific Netlify IP range from which the requests are made. But is there a way to know the IP of the server at the time when a specific build was run on it?

You can do that by running: https://www.cyberciti.biz/faq/ubuntu-linux-determine-your-ip-address/ as a part of your build command, but for that, the repo would be in the building stage, and not fail during preparing repo.

During preparing repo, I think we simply run git checkout to get to the correct branch/PR.

@hrishikesh Indeed, having the IP address for successful deployment will not be very helpful to investigate about the ones which fail.

However, I noticed that since the end of last week, the error I’m getting in the logs has changed.

Before, we had the following:

Failed during stage 'preparing repo': error checking for ref:
: ssh: connect to host gitlab.liip.ch port 22: Connection timed out

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
: exit status 128

And now we have the following

Failed during stage 'preparing repo': error checking for ref:
: ssh: connect to host gitlab.liip.ch port 22: Cannot assign requested address

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
: exit status 128

The difference being that we’re getting Cannot assign requested address instead of Connection timed out.

Could it be that something changed on Netlify’s end, which brings up this new error?
Note that it still takes two minutes before we get the error, so it appears that we still hit a timeout.
Moreover, the erratic behaviour keeps going, as we also still have successful git connections that lead to successful builds/deployments.

Asking about this new error to our infra provider, they told me the following:

The new error code sounds strange, “Cannot assign requested address” would imply (from what I found online) that they where trying to give them self’s an address on their infrastructure which then failed. This would mean there is an error at Netlify, which we cannot do anything about it.

Could this new error help you figure out the root cause of the issue?

Hey @david.s,

We would confirm this with the devs and circle back as soon as we have more info.

Hi David,

The error Cannot assign requested address is caused by our failure to connect to your Git host over IPv6. Git is trying IPv6 because connecting over IPv4 failed. Our build system does not support IPv6 connectivity, so the real issue is that we are not able to connect to your Git provider over IPv4 (the timeout issue).

This smells like a networking problem. Is it possible there is some WAF (web application firewall) sitting in front of your Git provider? Requests from our build system will come from one of three IP addresses, so is it possible there is some rate limiting/DoS protection being triggered? That could help explain the sporadic successes you’re seeing.

Let me know if you think it would be helpful to set up a conversation between myself and your Git provider to try to continue to diagnose the issue.

Thanks,
Alex

@akahn Thanks for your reply.

It is indeed possible that there is some rate limiting or IP blocking. I initially found that there isn’t a specific Netlify IP range. From your reply, I understand that there is actually one now. Would it be possible to know what are the three possible IP addresses from which the requests are coming from? It would definitely help confirm or discard the possible IP blocking root cause.

I’ve also asked our provider about your suggestion to get in touch directly with you.

Although these IPs are fairly stable right now, they will change in the future, so allowlisting them will not be a solid solution. My apologies for the insinuation.

@akahn Thanks for the clarification. May we still get the current IP addresses, just to check if they’re not currently being blocked?

Then, if not by IP, would there be a long-term way to identify and allow requests coming from Netlify?

I don’t think we’ll be able to share those IP’s, David, so let’s approach this differently:

To accomplish your goal of identifying a request as a Netlify request, you’d need to build something to identify our requests, perhaps appending a query parameter to the URL’s you fetch, and verifying that query param before serving the files or allowing the connection? Not sure what is convenient for your other services that we connect to; you’ll know best :slight_smile:

Or perhaps if that isn’t up to your needs, you would build something in your own CI systems that you can control in more detail including how they connect to your other systems - and then deploy from there via our CLI instead? Getting Started with the Netlify CLI

1 Like

@fool Thanks for the query parameter idea. It could work to help identify the request, indeed. Although, I understand that the request is prepared and sent by Netlify systems. I’m not sure to see how we could hook into it to add a query parameter.

Anyway, by checking further with our infra provider, we finally found that they were actually blocking one IP address which is among the ones used by Netlify build system.

Unblocking it appears to have solved the issue we were experiencing.

So, in the end, nothing wrong with Netlify.

Thank you very much for your availability, patience and help to solve that issue.

1 Like

you’re welcome @david.s

We’re glad to help :slight_smile:

If you need anything else let us know