Rogue function blew through my 150,000 calls in little time

This is pertaining to my blog, https://frankstall.one aka https://frankstallone.netlify.app/. I received an email from Netlify telling me I was 50% on my invocation limits and thought nothing of it. My blog has had 70 visitors in the last 30 days (source: GA). I have a Like button that uses Astro db. It does a call when someone loads the page, and of course when someone clicks the Like button. Really simple functionality, the first SSR hybrid element I have added to an Astro site. It has been live for months without anywhere near the limits of invocations. Then, I received an email saying I was at 90%. At that point, I knew something was terribly wrong, and that I was probably already over the limit. By the time I got to the logs I could see I was over the limit.

I went over to the function logs and to my surprise, something was churning away eating up function calls. In a panic I removed the Like functionality completely, and turned the site back into a static site. Upon venturing back to the Logs > Function section I see nothing, now.

This is slightly concerning because I have no idea if I am going to be charged, and because I can’t go back more in time to see what happened. Not that I would be able to ascertain much from the logs, this is what they look like before I turned the site from hybrid to static, and it really gives me no information I can trace the calls to…

(See next post for screenshot since I am new and can only post one embedded media item per post)

I tried Ask Netlify to see how much 15,000 calls after my Functions Level 0 (free) invocations would cost me, and it repeatedly said nothing. :sweat_smile::pray:

  1. How does one diagnose issues like this as opposed to frantically removing functionality from their site?

  2. How does one know if their going to get charged? I imagine I’ll get SOME email notification saying I went over since I did get two previous ones but am I am limbo right now?

A general statement of fear: This gives me all those vibes of people who have seemingly small websites and end up with crazy bills from hosts. To be clear, I don’t see that happening here. It looks like Level 1 is $25. I am lucky, and grateful, Netlify has such a “generous free tier”. I would be remiss if I didn’t say that I have grown to fear “generous free tiers”. It’s great marketing but when things go wrong like this, it could get extremely expensive — fast.

Feel free to AMA.

Log > Functions screenshot of function calls churning away… burning CPU time for nada.

Hey Frank,

Did you change some settings around PartyTown on your website? I am seeing non-stop requests to PartyTown/ProxyTown in the network tab of the Chrome developer tools when viewing your website’s homepage. A significant portion of your website’s traffic is coming from that.

1 Like

Hey Ramon — thanks for your quick response! The short answer is no. I haven’t touched anything partytown related since I originally created the site. I used it for scripts in the header, starting on line 192 here and that’s it. I have, however updated the Astro package throughout the course of the last few months as I typically do with all dependencies.

Also worth noting, nothing on my home page called this Like button functionality, which was the only SSR function I had on the site. :thinking:

All right. I think PartyTown is misbehaving, and you’re definitely generating a lot of 404 requests from it from some browsers, but that won’t trigger functions.

Something I do notice, is that /Users/ramon/netlify/git/frankstall.one/.netlify/functions-internal/ssr/.netlify/build/pages/_image.astro.mjs is perhaps doing some SSR for images?

From Git bisect, I have found this:

Introduced today in this commit: dep: ⬆️ Updating astro deps · frankstallone/frankstall.one@3e4f1ba · GitHub

Did you get the first email about invocations limit today, or before? If you got it today, I think this is likely the cause.

Edit: More bisecting shows that this occurs when astro is on version >=4.10.3 and @astrojs/netlify is on version >=5.3.3. You can try downgrading either or both to their previous version numbers, and it may help. If it does, you’ll have to open a bug report to Astro.

1 Like

First email was Saturday 4:33AM ET saying I hit 50% which I knew was off. I was away this whole weekend and unable to look at the situation. I updated the deps today to see if the sea of function calls in the logs would stop. It didn’t.

As far as I know all my images are in the Git repo, and served as static assets. The only SSR component should have been this like button.

The “Like” functionality is the only user-defined SSR on your website, but the Astro adapter adds framework-defined SSR functions (like many framework adapters).

Since the timeline doesn’t match up with the image catch-all, I looked again at PartyTown traffic and its corresponding logs.

The URL for almost all of your function invocations is /~partytown/proxytown (there is some usage of the /api/like-post/* function). There is then a long tail of classic WordPress scanning URLs, such as //xmlrpc.php, //shop/wp-includes/wlwmanifest.xml. Obviously, there aren’t real functions. Like /~partytown/proxytown, they return a 404 status code.

My current theory is that the Astro adapter installs a catch-all SSR function for all URLs that don’t match static assets. After building your site with npx astro build, you can see a hint for that on line 4 of this file: .netlify/functions-internal/ssr/ssr.mjs . I have to ask our frameworks team (who have expertise in this adapter and have a relationship with the Astro team) to see if this is a correct interpretation. I believe this is currently the expected behavior, but I’d like to confirm it because if this is not true, there might be something else wrong with the site setup.

The other side of the issue is that PartyTown, in some cases, is making requests to the server instead of using the ServiceWorker (when there’s a ServiceWorker, the requests are resolved locally) PLUS it enters an infinite loop to generate many requests (this I can see even with the ServiceWorker is working). This causes the browser to send many requests to the non-existent route of /~partytown/proxytown, each call triggering a function invocation. I do not know if we have enough knowledge of PartyTown to assist with debugging this issue. I believe this is the root cause of the issue; you’re effectively DDoSing yourself.

I’ll note that this stopped about 9 or 10 hours ago. However, there were other periods of calmness in the past week, so it may simply be that there isn’t currently a browser making requests and it will resume again eventually.

1 Like

Here’s what I know. I use this same Astro set up with partytown on multiple websites, all deployed on Netlify. None of them have run into this particular issue, but they have all always been static sites, this particular one was my first play with a hybrid build with Astro and Netlify.

To name a few:

I do not see that hint when I run npm run build. I have systematically attempted to remove all the SSR bits from this site in order to stop function invocations from happening. Most recently I removed the Netlify adapter completely (†link redacted see screenshot & last sentence below). Maybe this stopped it?

Right now, I don’t want any of this. This is a hard pass on SSR or Hybrid for me. I simply do not want to risk some random rogue dependency causing a DDoS style ping on a service that blows through my free allocation of function invocations which is so astronomically high for a simple blog it’s ridiculous. I do not want Netlify or Astro doing anything with SSR for images. I didn’t ask for it.

I got my “Welcome to Functions Level 1” email this morning at 4:33AM ET and will gladly pay Netlify to never have to deal with this again if need be. I apologize if this comes off sharp. I am frustrated, exhausted, and confused… † and this is the second time I have to amend my forum post to try and find whatever this ‘that host’ means:

By “the host”, I assume it means the website that the image is hosted on. It’s likely a site that serves a lot of spam, but I don’t know which one you used, so I’m not sure.

Indeed, the invocations stopped yesterday at around 9am Eastern Time. So if you don’t intend on trying to use SSR again on that site, this is resolved.

I understand your frustration, but I can only assist with finding the root cause to the extent of Netlify’s system.

I looked at your other websites. Skin Schema and Designzen don’t have the infinite loop issue with PartyTown, but Straightforward does. I see the same event names in the ServiceWorker Chrome developer tools network log, so there’s likely something common between them.

I do not know why this infinite loop occurs for these two sites and not the other two; it is likely within your PartyTown setup code. I also do not know why PartyTown makes real network requests (which 404) in some browsers and only on this website (not the other three); it is likely in PartyTown’s code or your PartyTown setup. I do recommend you eventually find a solution for one or both of these two issues.

If you want to try SSR again, after speaking to the relevant engineering team about this, since all of the invocations are from wasted 404 requests for the same URL, they suggested that you manually define an API route for POST /~partytown/proxytown that would set cache headers, likely with the durable directive, so that the function would only be executed once and not per each request. This is still wasteful, as the website’s application keeps sending these requests, but it would make it cost the same as it does with SSR disabled (to be clear: you still have this fake traffic generated, but it doesn’t invoke a function now).

1 Like

By “the host”, I assume it means the website that the image is hosted on. It’s likely a site that serves a lot of spam, but I don’t know which one you used, so I’m not sure.

This forum error pop up came from a link to GitHub — not an image.

The commonality between Straightforward Growth and my blog is Hotjar. Neither the other sites use it.

I am still confused and trying to figure out what is going on. Let me see if I can articulate this and let me know if it’s right. There is something wrong with the Partytown implementation. If I am running a site with SSR, that Partytown issue becomes an unbelievable amount of function invocations. If I am running a static site, the Partytown issue becomes effectively fake traffic whenever a user is on the page?

I wonder if Hotjar is touching some HTML attributes that Partytown is observing. I don’t know enough about Partytown to investigate without spending a significant amount of time on it.

Yes, you articulated it right, noting that the same issue occurs with and without SSR, but it’s a lot less costly for a static site. There’s one asterisk: It’s not happening for all users. With my browser, I see that the Partytown requests are properly intercepted by the ServiceWorker and not sent out to the server. From the logs, it’s evident that for some browsers they do go reach to the server. I see three user agents (two Safari variants and one Chrome variant) and 10 IP addresses, looking at the last few days.

1 Like

Thank you. Can you tell me where you are seeing these logs?

These are in our internal support system. Enterprise customers can set up log drains to their own logging system, such as Datadog or New Relic, to see them on their own (and, in the future, perhaps also paid non-enterprise customers).