Gatsby v4 Builds suddenly killed, exit code 137

Hi everyone. All of my Gatsby v4 sites suddenly had their builds start to fail due to an error “Command failed with exit code 137”. This wasn’t happening just last week, I’ve tried rebuilding from commits that had built successfully recently on multiple projects and now they all break.
From what I read, this error seems to be related to memory issues, but the websites are very small (one of them is 3 pages, the whole data from Sanity is 6MB, assets included), and the fact that they used to build successfully and that this started happening in multiple projects at once makes me think this is something that changed on Netlify’s side.
Also, all my Gatsby v2 or v3 website still build successfully, but again, it’s not just a Gatsby problem as I didn’t upgrade it anytime recently in these projects that are failing, most of them were always Gatsby v4 projects and worked fine thus far.

Here’s the full log for one of my failed builds (for one of my projects): Netlify failed build, exit code 137 · GitHub
And here’s the full log for a successful build based on the same commit: Netlify successful build · GitHub

Can you please give me some guidance? I have multiple projects stalled due to this problem.

I tried running using the command npx process-top ./node_modules/.bin/gatsby build instead of yarn build to see whether there was some crazy memory usage going on but it doesn’t seem like it, it didn’t even hit 1GB, so I see no reason for the build to fail, it really seems to be a Netlify problem (triggered by some Gatsby 4 behaviour but still).

You can see the logs for this build here: Netlify failed build, exit code 137, with npx process-top · GitHub
Note: the build seems to succeed, but in fact it fails but the npx process-top seems to swallow the error.

Hey there, @fvieira :wave:

Thanks for reaching out about this. Sorry to hear you are encountering obstacles. Can you please share your project repo? This will help us look into your setup further.

Sure, I’ll send you the code for one of our Gatsby v4 projects over a private message.

Hi @fvieira

Not sure what your tool is reporting, but you’re definitely needing much more than 1 GB RAM:

8 GB to be specific, where the build finally gives up and fails.

This is the graph showing memory usage for your build (the gist that you shared).

Hi @hrishikesh, thank you for your reply, you’re probably right, but I still have few questions (you probably don’t have answers for all of them, but something is better than nothing):

  • Why wasn’t this happening before last week if my code didn’t change at all? It must have been some change in Netlify, can you help me identify what changed so I can understand what I can do to solve this?
  • Why is this happening only to my Gatsby V4 websites? Maybe this has to do with Gatsby’s V4 memory consumption, but I wonder if no one else is having this problem, I don’t think I’m doing anything weird on all my V4 websites…
  • How can a 3 page website build take 8.6GB of memory? Locally it doesn’t use more than 2GB, what could be the difference in Netlify? The Node version? Can you show me the memory usage for the successful build (the one using Gatsby V3: Netlify successful build · GitHub)?

Some extra info, I had a Gatsby V4 build work, but contrary to all other websites, this one is not connected to a CMS (Sanity), which makes sense as the other builds fail on GraphQL related stuff. Now the problem is not simply explaned by the connection to Sanity because the Gatsby V3 websites are also connected to Sanity and work, but still…

For the record I’m suffering the same issue since a few days ago, with no code changes and in our case using Gatsby V2. Site builds locally so this looks indeed like a Netlify issue. In my case the error happens after page queries are made:

7:33:28 PM: success run page queries - 11.151s - 530/530 47.53/s
7:40:41 PM: Killed
7:40:41 PM: error Command failed with exit code 137.
7:40:41 PM: info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

My site name is “slashdata-website” in case you want to check, and for building I’ve been using: gatsby build && ps auxw ; false

As part of the errors I’m getting a bunch of:

7:40:42 PM: Started saving yarn cache
7:40:44 PM: events.js:174
7:40:44 PM:       throw er; // Unhandled 'error' event
7:40:44 PM:       ^
7:40:44 PM: Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
7:40:44 PM:     at process.target.send (internal/child_process.js:636:16)
7:40:44 PM:     at reportSuccess (/opt/build/repo/node_modules/jest-worker/build/workers/processChild.js:83:11)
7:40:44 PM:     at tryCatcher (/opt/build/repo/node_modules/bluebird/js/release/util.js:16:23)
7:40:44 PM:     at Promise._settlePromiseFromHandler (/opt/build/repo/node_modules/bluebird/js/release/promise.js:547:31)
7:40:44 PM:     at Promise._settlePromise (/opt/build/repo/node_modules/bluebird/js/release/promise.js:604:18)
7:40:44 PM:     at Promise._settlePromise0 (/opt/build/repo/node_modules/bluebird/js/release/promise.js:649:10)
7:40:44 PM:     at Promise._settlePromises (/opt/build/repo/node_modules/bluebird/js/release/promise.js:729:18)
7:40:44 PM:     at Promise._fulfill (/opt/build/repo/node_modules/bluebird/js/release/promise.js:673:18)
7:40:44 PM:     at MappingPromiseArray.PromiseArray._resolve 

Any tip appreciated.

Any update @hrishikesh ?

Hi @slashdata, while your error is also the exit code 137, I’m not sure if it’s the same as mine, because mine is specific to Gatsby v4, in fact, I’ve been “fixing” this problem by downgrading Gatsby to v3 in the projects I need.

Thanks @fvieira , that’s good to know. I’m trying to figure out what the issue could be on my side, given that the project build successfully locally (production build) and we didn’t make any changes to it. I requested to incease the build time thinking this could be due to a timeout error but this doesn’t seem to be the case. If this continues to persist I think I’ll move this to AWS.

Nothing specific changed that would cause particularly your builds to fail. Build memory is dynamically allocated during the start of a build - it depends on server load among other factors. Sometimes you might get more memory, sometimes less.

The memory that you got is already a recent upgrade. We used to offer 3 GB memory just till a month ago I suppose.

Possibly because of PQR (Parallel Query Running): Parallel Query Running · gatsbyjs/gatsby · Discussion #32389 · GitHub and Need clarification on disabling PARALLEL_QUERY_RUNNING at a runtime · gatsbyjs/gatsby · Discussion #34473 · GitHub

Unfortunately, we cannot answer that. We don’t know your code base, we don’t know what your site is doing or how it’s building. As far as the differences are concerned, do note that, Netlify is using Ubuntu Focal. If you’re using Windows or MacOS, this is a major difference as with different OS, stuff might work differently. You can use Netlify Build Image Docker container and test your builds locally.

Here you go:

@slashdata

Your website met the same issue:

Not sure what Gatsby version you are using, but there was a similar Gatsby issue which one of our devs fixed:

Maybe that would help?

Thanks a lot for your answer @hrishikesh but my feeling is that something changed on Netlify during these last days which triggered this error. As I mentioned in my original comment, we’re using 2.23.3 (a rather old version, I know) and we didn’t perform any change in our code base for weeks, so I can’t find any explanation other than something changing in your build environment.

Thank you for the reference to that issue, but unfortunately in our case upgrading to v4 is something we can’t afford to do right now, given that such upgrade would take days to be done.

So if the issue seems to be due to using single-core for building, I’m wondering if there is any way to set this in Netlify (perhaps upgrading plan?) in order for this bug not to happen and to buy some time until we do the gatsby update, etc. Otherwise I’m afraid that our only option would be migrating to AWS (I just set CodePipeline and the project built as expected) at the expense of having to also migrate our forms and functions functionality which we were using through Netlify.

Thank you in advance.

@hrishikesh, thank you for the effort in your reply, I’ll have a deeper look into this later when I have a bit more time, but your answer is appreciated.

@slashdata, does your build depend on any external service besides Netlify? Because I’m in the same situation as you are, with builds that used to work and that stopped working all of a sudden even though the code didn’t change, but I’m not ready to say for sure this was due to a Netlify change as I also depend at least on Sanity’s api (which is what the Gatsby code is speaking with when the error occurs), so this error could be due to a change in Sanity’s api which somehow increases memory consumption on Gatsby v4, going over Netlify’s limit. So, while I still feel Netlify is the most likely culprit, I can’t say it is for sure yet until I’ve ruled out other external players.

@fvieira we use two gatsby plugins to retrieve content from keystone and wordpress but again, nothing in our code (including plugins) has been updated. Other than that we retrieve some other data from a couple of APIs but these are not the cause for sure. Also, as I mentioned before I’ve been able to deploy the exact code base in AWS CodePipeline so the cause is pretty clear to me.

I’m just noticing on my sites the following alert: “After August 18, 2022, builds for this site will fail unless you update the build image.” which makes me wonder if this was part of a bigger update which triggered this issue. I really don’t know.

That could be true - but if it were, we’d have seen a lot more complaints than you and fvieira. But, here’s also a change log of what’s changing: Releases · netlify/build · GitHub.

For Gatsby 4, I think I’ve mentioned the cause - it’s most likely PQR. Gatsby says that it takes a lot more memory, so as long as you can set the CPU Count to 1, it should work. I don’t know why it’s causing issues on Gatsby 2 though. From my point of view, I think Gatsby is the one at fault here for consuming 6 GB and 8 GB memory. In all of the site builds that have failed due to lack of memory (at least on Netlify), Gatsby has accounted for the highest number. I have heard that Gatsby 4 has improved this by a huge margin, but seeing the above post doesn’t look too promising - however, it would be wrong to judge from one or two sites failing to build.

To be honest, I would not think this to be a Sanity API issue - because the task of an API is to return data - not much to do with memory consumption on the client-side. It’s understandable that you all think this is a Netlify issue - but a lot of Gatsby sites are currently also building correctly on Netlify - so not really sure who needs to own the blame here. So @fvieira, you can save yourself some trouble and not do a lot of testing - unless you’re extremely determined to.

@slashdata, you’re free to use AWS for your needs - you need not mention that in every other message. If Netlify is not suiting your use case, or if AWS is more suitable, we don’t wish to force you to use our platform - you’re free to make a switch. We won’t be happy about it, but we’d at least know where we need to improve.

Lastly, the warning that you’re seeing is not August 18 but November 15. It’s simply updating the build image and nothing is changing for you unless you manually upgrade. If you don’t upgrade, you won’t be able to build your sites after that date anymore, but that has nothing to do with this error.

I’m sorry, I do not have a better news for you both, but build failures due to running out of memory is not something we can possibly debug effectively. We’ve made the tools available (like the Netlify Build Image mentioned above), that you can use locally and see if there’s a specific limit at which your site builds fine, so you can try and optimise your process accordingly - and it has helped a few people in the past. They found some memory leaks and more optimal solutions to their codebases, which has helped them lower the memory consumption.

As I’ve said above, the upgraded limits of RAM in the build pods has been a fairly recent change - so until a few weeks ago, based on your descriptions, it sounds like you could have never been able to build sites on Netlify - at least not with Gatsby.

Thanks for your thorough response @hrishikesh . Regarding my mention of AWS, it wasn’t made for promotional purposes (not that the need it anyway) if that’s what you’re suggesting, but as a way to confirm if the build issue was on our side or not, and also to answer @fvieira 's comment. Netlify has been serving us very well for many months, but unfortunately whatever happened that is preventing our site to build as it used to is quite a bummer. I for one would love to continue using it as I’ve been doing so far, and the fact that we rely on a couple of Netlify services to run our forms and functions makes this potential migration harder, so it’s not exactly something that I’m looking forward to get into.

Thanks again for your aid.

HI @hrishikesh, I’m back to trying to re-upgrade my websites to Gatsby 4 and unfortunately it seems the problem persists, it still fails with an exit code 137.

I’ve run the docker image you suggested and used it to build my project, and never saw it go over 2GB, in fact it isn’t even over 1GB in the moment where it crashes on Netlify, so I have no idea why it uses more than 8GB on Netlify…

As soon as I have time I’ll try building an empty Gatsby 4 project on Netlify and see if it also crashes, if it does, then there’s either something very wrong with Gatsby or with Netlify, if it doesn’t maybe it really is something I’m doing in all of my projects (although I doubt it as they were working fine in Netlify until August).

Interesting to hear, @fvieira . In our own testing, the docker image takes around 1Gb just for the OS with no build running, so it would be pretty shocking for your build to take 0Gb as you report you saw it never go over 1GB.

Anyhow, we’ll be happy to take another look when you have the gatsby 4 build up here, since we can see memory used after the fact to let you know if that is the problem with some newer build!