Gatsby v4 Builds suddenly killed, exit code 137

For the record I’m suffering the same issue since a few days ago, with no code changes and in our case using Gatsby V2. Site builds locally so this looks indeed like a Netlify issue. In my case the error happens after page queries are made:

7:33:28 PM: success run page queries - 11.151s - 530/530 47.53/s
7:40:41 PM: Killed
7:40:41 PM: error Command failed with exit code 137.
7:40:41 PM: info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

My site name is “slashdata-website” in case you want to check, and for building I’ve been using: gatsby build && ps auxw ; false

As part of the errors I’m getting a bunch of:

7:40:42 PM: Started saving yarn cache
7:40:44 PM: events.js:174
7:40:44 PM:       throw er; // Unhandled 'error' event
7:40:44 PM:       ^
7:40:44 PM: Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
7:40:44 PM:     at process.target.send (internal/child_process.js:636:16)
7:40:44 PM:     at reportSuccess (/opt/build/repo/node_modules/jest-worker/build/workers/processChild.js:83:11)
7:40:44 PM:     at tryCatcher (/opt/build/repo/node_modules/bluebird/js/release/util.js:16:23)
7:40:44 PM:     at Promise._settlePromiseFromHandler (/opt/build/repo/node_modules/bluebird/js/release/promise.js:547:31)
7:40:44 PM:     at Promise._settlePromise (/opt/build/repo/node_modules/bluebird/js/release/promise.js:604:18)
7:40:44 PM:     at Promise._settlePromise0 (/opt/build/repo/node_modules/bluebird/js/release/promise.js:649:10)
7:40:44 PM:     at Promise._settlePromises (/opt/build/repo/node_modules/bluebird/js/release/promise.js:729:18)
7:40:44 PM:     at Promise._fulfill (/opt/build/repo/node_modules/bluebird/js/release/promise.js:673:18)
7:40:44 PM:     at MappingPromiseArray.PromiseArray._resolve 

Any tip appreciated.

Any update @hrishikesh ?

Hi @slashdata, while your error is also the exit code 137, I’m not sure if it’s the same as mine, because mine is specific to Gatsby v4, in fact, I’ve been “fixing” this problem by downgrading Gatsby to v3 in the projects I need.

Thanks @fvieira , that’s good to know. I’m trying to figure out what the issue could be on my side, given that the project build successfully locally (production build) and we didn’t make any changes to it. I requested to incease the build time thinking this could be due to a timeout error but this doesn’t seem to be the case. If this continues to persist I think I’ll move this to AWS.

Nothing specific changed that would cause particularly your builds to fail. Build memory is dynamically allocated during the start of a build - it depends on server load among other factors. Sometimes you might get more memory, sometimes less.

The memory that you got is already a recent upgrade. We used to offer 3 GB memory just till a month ago I suppose.

Possibly because of PQR (Parallel Query Running): Parallel Query Running · gatsbyjs/gatsby · Discussion #32389 · GitHub and Need clarification on disabling PARALLEL_QUERY_RUNNING at a runtime · gatsbyjs/gatsby · Discussion #34473 · GitHub

Unfortunately, we cannot answer that. We don’t know your code base, we don’t know what your site is doing or how it’s building. As far as the differences are concerned, do note that, Netlify is using Ubuntu Focal. If you’re using Windows or MacOS, this is a major difference as with different OS, stuff might work differently. You can use Netlify Build Image Docker container and test your builds locally.

Here you go:

@slashdata

Your website met the same issue:

Not sure what Gatsby version you are using, but there was a similar Gatsby issue which one of our devs fixed:

Maybe that would help?

Thanks a lot for your answer @hrishikesh but my feeling is that something changed on Netlify during these last days which triggered this error. As I mentioned in my original comment, we’re using 2.23.3 (a rather old version, I know) and we didn’t perform any change in our code base for weeks, so I can’t find any explanation other than something changing in your build environment.

Thank you for the reference to that issue, but unfortunately in our case upgrading to v4 is something we can’t afford to do right now, given that such upgrade would take days to be done.

So if the issue seems to be due to using single-core for building, I’m wondering if there is any way to set this in Netlify (perhaps upgrading plan?) in order for this bug not to happen and to buy some time until we do the gatsby update, etc. Otherwise I’m afraid that our only option would be migrating to AWS (I just set CodePipeline and the project built as expected) at the expense of having to also migrate our forms and functions functionality which we were using through Netlify.

Thank you in advance.

@hrishikesh, thank you for the effort in your reply, I’ll have a deeper look into this later when I have a bit more time, but your answer is appreciated.

@slashdata, does your build depend on any external service besides Netlify? Because I’m in the same situation as you are, with builds that used to work and that stopped working all of a sudden even though the code didn’t change, but I’m not ready to say for sure this was due to a Netlify change as I also depend at least on Sanity’s api (which is what the Gatsby code is speaking with when the error occurs), so this error could be due to a change in Sanity’s api which somehow increases memory consumption on Gatsby v4, going over Netlify’s limit. So, while I still feel Netlify is the most likely culprit, I can’t say it is for sure yet until I’ve ruled out other external players.

@fvieira we use two gatsby plugins to retrieve content from keystone and wordpress but again, nothing in our code (including plugins) has been updated. Other than that we retrieve some other data from a couple of APIs but these are not the cause for sure. Also, as I mentioned before I’ve been able to deploy the exact code base in AWS CodePipeline so the cause is pretty clear to me.

I’m just noticing on my sites the following alert: “After August 18, 2022, builds for this site will fail unless you update the build image.” which makes me wonder if this was part of a bigger update which triggered this issue. I really don’t know.

That could be true - but if it were, we’d have seen a lot more complaints than you and fvieira. But, here’s also a change log of what’s changing: Releases · netlify/build · GitHub.

For Gatsby 4, I think I’ve mentioned the cause - it’s most likely PQR. Gatsby says that it takes a lot more memory, so as long as you can set the CPU Count to 1, it should work. I don’t know why it’s causing issues on Gatsby 2 though. From my point of view, I think Gatsby is the one at fault here for consuming 6 GB and 8 GB memory. In all of the site builds that have failed due to lack of memory (at least on Netlify), Gatsby has accounted for the highest number. I have heard that Gatsby 4 has improved this by a huge margin, but seeing the above post doesn’t look too promising - however, it would be wrong to judge from one or two sites failing to build.

To be honest, I would not think this to be a Sanity API issue - because the task of an API is to return data - not much to do with memory consumption on the client-side. It’s understandable that you all think this is a Netlify issue - but a lot of Gatsby sites are currently also building correctly on Netlify - so not really sure who needs to own the blame here. So @fvieira, you can save yourself some trouble and not do a lot of testing - unless you’re extremely determined to.

@slashdata, you’re free to use AWS for your needs - you need not mention that in every other message. If Netlify is not suiting your use case, or if AWS is more suitable, we don’t wish to force you to use our platform - you’re free to make a switch. We won’t be happy about it, but we’d at least know where we need to improve.

Lastly, the warning that you’re seeing is not August 18 but November 15. It’s simply updating the build image and nothing is changing for you unless you manually upgrade. If you don’t upgrade, you won’t be able to build your sites after that date anymore, but that has nothing to do with this error.

I’m sorry, I do not have a better news for you both, but build failures due to running out of memory is not something we can possibly debug effectively. We’ve made the tools available (like the Netlify Build Image mentioned above), that you can use locally and see if there’s a specific limit at which your site builds fine, so you can try and optimise your process accordingly - and it has helped a few people in the past. They found some memory leaks and more optimal solutions to their codebases, which has helped them lower the memory consumption.

As I’ve said above, the upgraded limits of RAM in the build pods has been a fairly recent change - so until a few weeks ago, based on your descriptions, it sounds like you could have never been able to build sites on Netlify - at least not with Gatsby.

Thanks for your thorough response @hrishikesh . Regarding my mention of AWS, it wasn’t made for promotional purposes (not that the need it anyway) if that’s what you’re suggesting, but as a way to confirm if the build issue was on our side or not, and also to answer @fvieira 's comment. Netlify has been serving us very well for many months, but unfortunately whatever happened that is preventing our site to build as it used to is quite a bummer. I for one would love to continue using it as I’ve been doing so far, and the fact that we rely on a couple of Netlify services to run our forms and functions makes this potential migration harder, so it’s not exactly something that I’m looking forward to get into.

Thanks again for your aid.

HI @hrishikesh, I’m back to trying to re-upgrade my websites to Gatsby 4 and unfortunately it seems the problem persists, it still fails with an exit code 137.

I’ve run the docker image you suggested and used it to build my project, and never saw it go over 2GB, in fact it isn’t even over 1GB in the moment where it crashes on Netlify, so I have no idea why it uses more than 8GB on Netlify…

As soon as I have time I’ll try building an empty Gatsby 4 project on Netlify and see if it also crashes, if it does, then there’s either something very wrong with Gatsby or with Netlify, if it doesn’t maybe it really is something I’m doing in all of my projects (although I doubt it as they were working fine in Netlify until August).

Interesting to hear, @fvieira . In our own testing, the docker image takes around 1Gb just for the OS with no build running, so it would be pretty shocking for your build to take 0Gb as you report you saw it never go over 1GB.

Anyhow, we’ll be happy to take another look when you have the gatsby 4 build up here, since we can see memory used after the fact to let you know if that is the problem with some newer build!

@fool I used htop to look at the memory (probably not the perfect way to measure memory but still), and the build process never went above 2GB that I could see.

@hrishikesh I’ve done some more testing, and came to some very interesting findings.

First, I tried building a Gatsby v4 empty project (basically the result of running npx gatsby new), and tried to build it on Netlify and it worked.

This proved there was something in my code breaking things, so I started deleting stuff from my codebase to try to pinpoint what was the offending code, and after a ton of binary search, I found that the problem was with my gatsby-node.js importing a i18n.tsx file. If I deleted that import the build succeeded, so instead I deleted all the code inside that file and did a bunch of tests:

  • Importing empty file named i18n.tsx → fails
  • Importing empty file named i18n.ts → fails
  • Importing empty file named i18n.jsbuilds
  • Importing empty file named internationalization.tsx → fails
  • Importing empty file named internationalization.ts → fails
  • Importing empty file named internationalization.jsbuilds

So, at first glance it seems that something (not sure whether Gatsby or Netlify) doesn’t like me importing .ts or .tsx files from inside gastby-node.js… except that I have another import of a .ts file in gastby-node.js and it builds happily anyway, so still not sure what’s the deal here…

If someone from Netlify could take a look at the two builds below and show me the memory usage graph, it might help, as the only difference between them is that one is importing the internationalization.ts empty file and fails, and the other is importing internationalization.js and builds. If this has a large impact on memory usage it might be a Gatsby v4 “problem” with mixing .js and .ts files (or not, I’m far from certain of anything at this point)…

Success: 6364545c9fddac0009c9fde7
Fails: 6364550194b90500093e87b0

One more thing, the tests above where I only changed the .ts to a .js file, were made with most of my project code deleted.

I now tested picking the whole project, and just changing that file from .tsx to .js, and although the build got a bit further, it still crashed with the exit code 137.

This lead me to believe my theory that importing .ts or .tsx files from .js files is causing the memory spike, but just removing one instance of this in the complete project is not enough, so I went through all my gatsby .js files and converted all .ts imports to .js (and deleted a bit of unused code related with translations) and now my project finally builds!

I’d love to also see the memory usage graph for this build as it has a ton more code than the ones that were failing in my message above. The build id is 63645da494b90500093f09b3.

And just to add that while there is definitely something going on in Gatsby, I’m still convinced there must have been some change in Netlify’s side to trigger this too, because I don’t see any other reason for my Gatsby v4 projects to stop building all at once without any changes in the code.

Thanks for doing all that research, @fvieira!

Here’s a graph of the, successful one:

failed one:

Interestingly, both took 8.59 GB (max). The allocated memory for both these builds was 8 GiB. So not sure why one would fail and the other would not.

The one where you converted all your project to JS took 8.57 GB:

and even this build has only 8 GiB memory allocated.

As I’ve already said, with Gatsby 4, you can probably try:

Hi @hrishikesh, sure, I could try to disable parallel query running to reduce memory, but according to the graphs you shared there seems to be about zero relation between memory usage and whether a build succeeds or fails, which just makes things even more confusing, if possible…
Isn’t there anything else you can do to try to understand why the builds were failing? Because right now I know the solution, but have no idea why it works and what else I could do to prevent this in the future…

Hi @fvieira - The reason my colleague Hrishikesh is mentioning the memory consumption for these builds is that once your build consumes the allocated limit, it will fail. There are builds that can consume a slight bit above the allocated memory and continue without failure, but that is not always a guarantee.

The remaining question to be answered is which process in your build is consuming memory up to the allocated limit. This can occur with either a runaway process OR a setting within Gatsby that limits resource consumption, as Hrishikesh pointed out with the article regarding PARALLEL_QUERY_RUNNING. Need clarification on disabling PARALLEL_QUERY_RUNNING at a runtime · Discussion #34473 · gatsbyjs/gatsby · GitHub

Hi @Kai-Mavyn, sure, I understand that going over the allocated limit should make the build fail, what I don’t understand is that according to those graphs, both the failing and working builds are using the same memory!

I’ve managed to fix the builds by removing most places where I was importing .ts files from my gatsby .js files, but according to the graphs it wasn’t the memory usage that changed… And it’s also not just a random thing, they have been consistently working since I did that change… That’s why I was asking for some help to get to the bottom of why the builds were failing and are now working, since the memory usage doesn’t seem to explain that.

The website is golden-visa-community-website-kw6eskqd, maybe you can check the memory usage for some successful builds and let me know if they are different from the ones killed with code 137.