Cache builds fail during 'preparing repo' step; "Error fetching branch"

We have a site configured to deploy from master, and to generate preview deploys for all pull requests.

Starting at 9:12PM EDT yesterday (2021-06-04) all builds using the cache started reporting Failed during stage 'preparing repo': exit status 1.

10:24:49 AM: Build ready to start
10:24:51 AM: build-image version: 0582042f4fc261adc7bd8333f34884959c577302
10:24:51 AM: build-image tag: v3.7.6
10:24:51 AM: buildbot version: 874d2252adccd75aaadc8bf7a520d26b3fedbb13
10:24:52 AM: Fetching cached dependencies
10:24:52 AM: Starting to download cache of 5.2GB
10:25:38 AM: Finished downloading cache in 46.346134866s
10:25:38 AM: Starting to extract cache
10:26:50 AM: Finished extracting cache in 1m11.668882572s
10:26:50 AM: Finished fetching cache in 1m58.459408585s
10:26:50 AM: Starting to prepare the repo for build
10:26:51 AM: Preparing Git Reference refs/heads/master
10:27:00 AM: Error fetching branch: refs/heads/master
10:27:00 AM: Creating deploy upload records
10:27:00 AM: Failing build: Failed to prepare repo
10:27:00 AM: Failed during stage 'preparing repo': exit status 1
10:27:00 AM: Finished processing build request in 2m8.777096846s

This test showed that clearing the cache does not fix the issue for subsequent builds.

The first building failing in this manner was Netlify App , a PR build. All subsequent builds that used the cache have failed in the same way, including builds on master (even though the PR had not been merged) and builds of other unrelated PRs.

(I apologize to community members that the links are not public; but I doubt there’s anything interesting there to anyone who does not have access to backend logs, as I said, the log above is representative.)

A representative example is this manually-triggered cache-using build of master that I ran after running a manually triggerd cache-clearing build: Netlify App

This is particularly frustrating because the logs are opaque here; they tell us it’s fetching the git ref, but then don’t show us the Git output or what git commands are running, so we have to guess at what lead to “exit status 1”, which as we see below can be any of a number of issues.

I have searched the docs and support forums, and have found several things that I checked but have failed to help:

I found “[Support Guide] My site deploy fails unless Netlify’s build cache is cleared” [Support Guide] My site deploy fails unless Netlify's build cache is cleared none of the tips there are applicable for 3 reasons: (1) empirically: This is observed to have started on multiple different branches each with different sets of changes, so it can’t be a result of any changes made to files in our git repo; (2) rationally: it can’t be caused by files in the git repo if the problem is that it fails to fetch the git repo; (3) we haven’t done any of the things mentioned there (besides having both package.json and yarn.lock–surely this was a mistake and the author meant to write package-lock.json and not package.json).

As a possible exception to (2): We do use git submodules. I suppose it is possible that the issue is that it’s having trouble fetching a repository referred to in .gitmodules. This is frustratingly difficult to verify, as the log does not include git’s output or what it’s doing in any detail, but instead just tells us “exit status 1”. As a test, I have created a PR that removes the submodules; if the issue is fetching a submodule then I expect this to still fail, but to fail at the later build step, rather than at the early ‘preparing repo’ step. This does not seem to be the case; even after I have removed the submodules it still fails at the ‘preparing repo’ step.

I found a customer problem Deploy fails with “failed during stage ‘preparing repo’: exit status 1” - #6 by tedgoas which looks similar and involves a Netlify bug, but offers no follow-up as to whether that bug ever got fixed (last update: May 14, 2020).

I found another customer problem Failing build: Failed to prepare repo, failed during stage 'preparing repo': exit status 1 - #2 by perry which looks similar, but magically went away after some time; giving me no path to resolving it in our case.

I found another customer problem Deploy fails with "failed during stage 'preparing repo': exit status 1" - #3 by tedgoas which looks similar. I attempted the suggested step of logging in to GitHub and removing Netlify’s authorization and then re-adding the authorization. This did not resolve the issue.

The web-UI build settings are as follows:

I found another similar-looking customer issue Frequent failures in 'preparing repo' stage but upon inspection does not seem to be relevant because the problem isn’t “Error fetching branch” and seems to be post-checkout steps that are lumped in to ‘preparing repo’.

It’s not fetching the ref, this is likely already stored with Netlify, they have to store the branch you want to build, and it’s likely what your seeing here, Netlify confirming what branch they are trying to fetch.

The actual error is the second line, the one that clearly states *Error. It’s not able to fetch your branch from GitHub.

Did something change on GitHub itself? Does Netlify still have access to the branch, the repo, the organization? Not much of help here, but the error isn’t opaque, it’s stating clear as day that it can’t fetch the repo. Edit: Notice now that you actually tried to re-add GitHub to Netlify… TLDR; :stuck_out_tongue:

You’re correct, it’s not totally opaque, but we still have little insight as to why it’s failing to fetch the ref. That error message is coming from Netlify, not from git; the actual error message from Git is being hidden.
But still, it is enough information to differentiate it from the other issue linked in my previous comment.

Total permission borkage seems unlikely, as it is able to fetch the ref and build correctly when clearing the cache.

The only thing I can think of would be that one of the submodule’s repo did get renamed/moved to a different org. But it’s a public repo (so permissions shouldn’t make a difference) and GitHub redirects/forwards to allow old names to keep working (so the rename shouldn’t make a difference). And plus, I did verify that this still happens even if I totally remove the submodules.

I missed the fact that it was working when you cleared the cache; I had the exact same issue myself, and it was because I was using the module I made in the same repo that showcased the example. Netlify had issues with this, due to the multiple package.json’s, where one defined the npm module and the other used the said npm module.

Not related; but it’s not impossible at all that submodules cause this.

Uninstalling the “Gatsby cache” Netlify plugin has resolved the issue (so the cache is now just for speeding up git clone?). Which is bonkers, because how is Gatsby even coming in to it if the error is “Error fetching branch”!?

But this is not an acceptable long-term solution because now builds are slow. We really lean on the cache to make deploys take a reasonable amount of time.

After having left the “Gatsby cache” plugin uninstalled for a few days, adding the plugin back in does not cause the issue to return.

So I guess the issue is resolved, but it isn’t a very satisfying resolution because we still have no idea what caused it, and don’t really understand what we changed to fix it.

hi @LukeShu , i am inclined to agree - that does seem like a pretty unsatisfying solution.

a few questions:

have you reported the problems, like so, on the plugin page?


can you share with your package.json with us?

thank you, and i am glad it is at least working again now for the time being.

Hi @perry,

I’ve just now reported the problem to the plugin: Caused "Error fetching branch" · Issue #56 · jlengstorf/netlify-plugin-gatsby-cache · GitHub I’ve posted the package.json there (I can’t seem to figure out how to do non-image uploads here).

hi @LukeShu - i had a chance to chat with our gatsby pro the other day, and we discussed this. Current working hypothesis is that gatsby cache isn’t bulletproof - if there is something wonky with the cache it may break.

We are thinking that when you moved a submodule, if you didn’t clear the cache right after that, it may have hung on to some incorrect references that are causing these headaches. that would also explain why removing and then putting it back fixed it.