Build intermittently fails to prepare repo (timeout, exit status 128)

Hello,

I’m having an issue at the beginning of the build process, during the stage ‘preparing repo’, where the connection to our self-managed GitLab instance times out sometimes.

I insist on the “sometimes” because some builds do pass, but I cannot tell what’s different, so that sometimes it works and sometimes it doesn’t.

Netlify site name: bcul-toolbox

Log of a build where the issue happens:

1:09:46 PM: build-image version: a2d22d22e4555d1ef0a972ed14a0a4b366ad20c4 (focal)
1:09:46 PM: build-image tag: v4.16.3
1:09:46 PM: buildbot version: 9a6b4d0d37eb2a90e2c482e1d6cfe9a0793e6262
1:09:46 PM: Fetching cached dependencies
1:09:46 PM: Starting to download cache of 115.6MB
1:09:47 PM: Finished downloading cache in 1.063823507s
1:09:47 PM: Starting to extract cache
1:09:49 PM: Finished extracting cache in 1.674892752s
1:09:49 PM: Finished fetching cache in 2.794568427s
1:09:49 PM: Starting to prepare the repo for build
1:12:00 PM: User git error while checking for ref refs/heads/feat/label-range-search
1:12:00 PM: Failing build: Failed to prepare repo
1:12:00 PM: Failed during stage 'preparing repo': error checking for ref: : ssh: connect to host gitlab.liip.ch port 22: Connection timed out

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
: exit status 128
1:12:00 PM: Finished processing build request in 2m13.939729416s

Build settings:

What we already tried

Our first guess was that the issue comes from our self-managed GitLab instance. We asked our service provider to look into it, but they didn’t find any failed attempts in the logs.

Here’s their analysis:

We investigated the issue and it seems that connections to the gitlab server weren’t established.

IP tables should not block the connections via SSH:

1287K 80M ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 multiport dports 22 /* 010 ssh accept tcp/v4 */

The connection attempt wasn’t logged on the gitlab server - at least on the 04.01 were you reported the ticket, assuming that you attempted to connection on that day. Failed attempts should be visible in the gitlab-shell logs or sshd logs, which also didn’t show any results.

This can point out that the connection from Netlify to the gitlab server was not established properly, but only judging the issue without any hints like IP address or date/time from attempts.

As you mentioned it is possible that Netlify didn’t accept the Gitlab host key, but it’s hard to tell without logs that provides such information.

We also checked the topic Frequently encountered problems during builds.

Enabling the debug mode didn’t provide additional information during the ‘preparing repo’ stage. Only when it goes beyond this step, we can see that there are more logs.

And as the build does sometimes pass, I understand that the permission to access the repository is set correctly. This rules out the most common root cause for the error 128.

Please let me know if there’s any additional information that might be useful to figure this out.

Thank you in advance for your help.

I don’t think Self-hosted Git is even supported on Starter plan. It seems to be a Business plan feature.

@hrishikesh Indeed, setting up a new site hosted on a self-managed GitLab instance is not possible from the Netlify UI.

Therefore, we went for a manual setup using the CLI, with netlify init --manual, so that the repo is considered as a custom Git provider. Going through the steps of the CLI command, it all went well and the deploy key as well as the webhook have been set up properly. Indeed, when a commit is pushed to the repo, the build is triggered on Netlify (webhook works). And sometimes, the build goes through successfully (deploy key is set up correctly).

What’s puzzling is why does it often timeout when trying to reach our Git remote (~3 out of 4 attempts fail)?

Would you happen to have more logs that could provide us with additional information?

Hi @david.s :wave:t6: , we don’t have more logs to provide. If your build is timing out you may need a build timeout increase. Also please make sure we have the permission to clone the repository you are trying to deploy**. The usual cause for the 128 error is that someone made some changes to settings or repository, some time after linking the repository to your site

Hi @SamO! Thanks for your reply.

If your build is timing out you may need a build timeout increase.

The part that times out is the SSH connection to the git repo. From the timestamp, we can see that the timeout is at about 2 minutes:

I understand that the Netlify build timeout is at 15 minutes, so increasing that value shouldn’t change anything, right? Moreover, when the build succeeds, then it takes only about 2 seconds to get to the next step.

Also please make sure we have the permission to clone the repository you are trying to deploy

We can see in the screenshot below that the build does sometimes succeed, without me changing anything in the settings. Do you think it could still be a permission issue?

hi @david.s

I took a look at this and it seems we try to reach out to the remote server, as it refers on the build logs:

6:04:28 AM: Failed during stage 'preparing repo': error checking for ref: : ssh: connect to host gitlab.liip.ch port 22: Connection timed out

This is pointing out to be some temporary unavailability by the remote server. Is this remote gitlab instance protected or behind a firewall?

Probably not since I can access from my own laptop:

❯ nc -zv gitlab.liip.ch 22
Connection to gitlab.liip.ch port 22 [tcp/ssh] succeeded!

There’s not much we can help here, can you try to add a test site that builds at the exact same time on a different repo somehow? different sites build from different clusters (definitely from different docker containers) so that may be worth exploring.

Thanks @gualter

Indeed, the GitLab instance is not protected. I’ll still check with our infra provider if there could be something else that could be partially blocking the requests.

We do have other sites linked to other repos on that same GitLab instance, and we witness the same behaviour, with some deployment going through, and some timing out when trying to SSH to GitLab.

The other sites in question:

  • pefo-website
  • vp2050

Would it be possible to know what is the command that is used by Netlify at the stage “preparing repo” so that we can try to reproduce the issue on our end?

I found that there isn’t a specific Netlify IP range from which the requests are made. But is there a way to know the IP of the server at the time when a specific build was run on it?

You can do that by running: https://www.cyberciti.biz/faq/ubuntu-linux-determine-your-ip-address/ as a part of your build command, but for that, the repo would be in the building stage, and not fail during preparing repo.

During preparing repo, I think we simply run git checkout to get to the correct branch/PR.

@hrishikesh Indeed, having the IP address for successful deployment will not be very helpful to investigate about the ones which fail.

However, I noticed that since the end of last week, the error I’m getting in the logs has changed.

Before, we had the following:

Failed during stage 'preparing repo': error checking for ref:
: ssh: connect to host gitlab.liip.ch port 22: Connection timed out

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
: exit status 128

And now we have the following

Failed during stage 'preparing repo': error checking for ref:
: ssh: connect to host gitlab.liip.ch port 22: Cannot assign requested address

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
: exit status 128

The difference being that we’re getting Cannot assign requested address instead of Connection timed out.

Could it be that something changed on Netlify’s end, which brings up this new error?
Note that it still takes two minutes before we get the error, so it appears that we still hit a timeout.
Moreover, the erratic behaviour keeps going, as we also still have successful git connections that lead to successful builds/deployments.

Asking about this new error to our infra provider, they told me the following:

The new error code sounds strange, “Cannot assign requested address” would imply (from what I found online) that they where trying to give them self’s an address on their infrastructure which then failed. This would mean there is an error at Netlify, which we cannot do anything about it.

Could this new error help you figure out the root cause of the issue?