One single error 128 on system re-deploy

(I’m not raising a support ticket - this is a free account and I’m just experimenting. I’m freeloading for now, so I’ll take what I can get without making anyone look at my site or logs!)

The Problem
I get one single “this function has crashed” when I re-deploy my production site.
F5-ing the site gets rid of it, but it’s a black screen of death I don’t want even one single user to ever see.

Details
It’s not, of course, crashed, it’s just been updated.

  • I could get around that by deploying a different branch to staging (I tested that already) and then cutting over. Nonetheless I’m looking at the issue now.
  • The Netlify AI chat bot reckons this is related to my GitHub repo, but the error is clearly related to a third party MongoDB which I’m trying to open on site launch. I’m dealing with that with them anyway; that’s not what this question is about.

Rather I’m wondering if there’s a way I can prevent this 128 error hitting a user screen.

  • Can I catch and retry this or something? It looks like a timeout, and it works perfectly on F5.
    Maybe I just need a longer pause between tear-down and re-publish or something.

If you’re seeking volunteer support don’t also insist on blindfolding them and tying their hands behind their back.

Provide as much detail as you can.

Outside of mentioning MongoDB you’ve provided zero information concerning what your site is built with.

From what I can tell you’re serving page views directly via function calls at runtime, is that correct?

From what I can tell you’re serving page views directly via function calls at runtime, is that correct?
That’s correct. I think there’s Node 20.x hosting a Sveltekit site using a standard Sveltekit adaptor. The Netlify stuff doesn’t give much technical information out, but it says “Functions level 0”. The site “endpoint” is /.netlify/functions/sveltekit-render

The Mongo access is caused by auth.js using a standard MongoDB adaptor, which makes a connexion as a client of Mongo in start up. I have one “scheduled function”, which uses an identical MongoDB client, but that works fine/ never errors. I can’t log the error to MongoDB as it’s happening in the making of that connexion, but I do have Netlify’s server logs.

Typically Netlify “wake from idle” works fine, I can see the smooth start up in the Netlify server logs there. The issue occurs for me only when upgrading the system.

What Works

Here’s a typical upgrade scenario, showing fault-free behaviour:

That shows my site’s startup preamble with a truncated Mongo connexion string, the hash and date of the build etc. The first startup attempt fails with the Error 128, which is coming from Mongo. The failure looks like it may take a couple of minutes in the log, although the timeout is presumably 30s.

What doesn’t work

The user symptom is that the first http request to the Netlify cluster [?} after an upgrade typically returns the 128 error instead of the expected site html; there’s a black screen with a Neflify error box in the middle of it. Refresh that page, and all is well again. Upgrades are pulled by Netlify from GitHub on the main branch.

Looking at those Netlify logs, this below is a typical extract. The error comes from MongoDB. That’s something I need to work out with them. However it’s always going to be possible that such an error may take place in operational conditions for whatever reason. I need to understand how to catch that type of error a bit more gracefully if I’m to use this for paying customers.

I suppose I could change my deploy so I push from GitHub and always hit the site a couple of times after deploy, that would get around this issue.

My Mongo connexion is failing, and that looks like it’s causing the two subsequent errors below. A simple site refresh and Mongo is all happy again, so the site starts and continues to work. I’m trying to work out if there’s anything I can do to make that a bit smoother, especially for the one user who will get the black screen error message thing.

Apr 23, 03:22:27 PM: f526cad1 ERROR  Unhandled Promise Rejection 	{"errorType":"Runtime.UnhandledPromiseRejection",
 "errorMessage":"MongoServerSelectionError: Server selection timed out after 30000 ms",
 "reason":{"errorType":"MongoServerSelectionError",
 	"errorMessage":"Server selection timed out after 30000 ms",
 		"reason":{"type":"ReplicaSetNoPrimary","servers"
...
	at process.<anonymous> (file:///var/runtime/index.mjs:1276:17)","    
	at process.emit (node:events:518:28)","    
	at emit (node:internal/process/promises:150:20)","    
	at processPromiseRejections (node:internal/process/promises:284:27)","    
	at processTicksAndRejections (node:internal/process/task_queues:96:32)","    
	at runNextTicks (node:internal/process/task_queues:64:3)","    
	at listOnTimeout (node:internal/timers:540:9)","    
	at process.processTimers (node:internal/timers:514:7)"]}

Apr 23, 03:22:27 PM: f526cad1 ERROR  [ERROR] [1713882147748] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 400.Apr 23, 03:22:27 PM: f526cad1 ERROR  RequestId: f526cad1-6302-47a5-b765-c2208f7be8fa Error: Runtime exited with error: exit status 128

Apr 23, 03:22:27 PM: f526cad1 ERROR  Runtime.ExitError

@phil-w This is a non-typical usage of Netlify, (at least from my perspective), so not something I can really dig into or provide any pointers on.

I presume you’ve googled around for the errors you’re seeing?

For example this Stack Overflow that discusses the timeout error message:

No worries, I’ll continue to do exactly that - google around especially for the MongoDB issue, which I expect is solvable. I was doing the same for the 128, but I don’t absolutely need to solve it at this point.

I expect the work-arounds already mentioned will work should I need to avoid the issue. thanks.