Help! Our assets are too big 😅 -- right way to use Netlify for static site with lots of assets?

Problem: a large Gatsby site with many gigabytes worth of static assets has become cumbersome for us to manage and takes a long time to build.

Yes! Build times have vastly improved in recent months (~9m → ~4m) thanks to Netlify & Gatsby’s incremental build efforts – thank you! But for updating one PDF that’s still a massive amount of compute power for what used to be a 2s FTP drag-n-drop.

:sweat_smile::point_right: I know that dates me. We love the global CDN with auto-invalidation. But from a DX standpoint, waiting for a behemoth to rebuild and redeploy a massive site for one PDF feels like overkill. (And maybe a little… climate-unfriendly?)

We know we can put the assets elsewhere. But that’s not ideal for our workflow. We are now breaking the site into a monorepo and like managing everything in one place.

Idea 1: move static assets into our monorepo’s “root” project, which hosts the master toml file

  • A symlink works to let sibling projects access the static directory during local dev, but that breaks Netlify Monorepo support, which (often) triggers a full rebuild of all sub-sites, even when rebuilt without cache.

  • Netlify caching is also an issue – the monorepo always has to download a 10 Gb cache in order to figure out if it can avoid other processes, obv this doesn’t scale.

Idea 2: move static assets out of this Netlify project

  • This could be to a separate project, a separate cloud CDN bucket, or a hosted CMS.

  • Seems like the “normal” thing to do. But it’s confusing – does Netlify want to be a service that can scale gracefully with large static sites? Or should we just take our fat assets elsewhere? :smile:

Idea 3: discover that Netlify has some normal way to deal with this!

  • For code changes, a single monorepo project is rebuilt. If only static assets change, [we can dream that] Netlify instantly and gracefully performs a few CDN operations behind the scenes, badabing.

We (CC @mfan @thealice) would love to hear from @fool or another Netlify dev relations person on this because it just seems like such a normal use case – team builds static site, static site grows, team sets up a monorepo and looks around for a next-level solution for assets.

1 Like

P.S. we are aware of Netlify’s paid Large Media service but the docs say “repositories with multiple connected sites (such as monorepos) are not supported”, and it isn’t clear that it would be any simpler than migrating assets to another cloud service.

BTW we are looking at a two-pronged strategy:

  1. Move the biggest files (videos) to a separate netlify project that just hosts the files, and set up an env var for the dev server to use absolute pathing to that remote host in each url in the code
  2. Keep all static pdfs and images in the monorepo’s projects, even if we have to duplicate a few of them.

This way we expect the monorepo project builds to be fast (their Netlify cache won’t be bloated with all the video files) and we can keep our all-in-one workflow mostly intact, since video files are infrequently updated and need other special consideration like compression anyway.

My takeaway
Now that Netlify bought Gatsby, it seems like one area that’s ripe for improvement is helping monorepo projects cross-manage assets so that they could live in a separate project. Right now Gatsby wants to source external assets back into its project & graph which results in massive bloat of the Netlify cache. For example Gatsby could create a netlify symbolic link plugin that plays nicely with Netlify.

We also need an easy way to share code between monorepo projects, which can be solved in a number of traditional ways (npm modules, git submodules, etc.). But it’d be so rad if this ecosystem provided some help doing this!

Thanks

Hey @mosesoak,

I was about to suggest:

Until I saw you had already thought of that yourself.

Now as for some context on why we build the entire project even if you need to update a single file:

Netlify stores the information about each file for a deploy to determine what to serve and if the file exists for that deploy. Each deploy has a list of files in the database and each file has a SHA1 hash. When someone tries to deploy, Netlify checks if the SHA1 for a particular file already exists on the CDN. If it does, the files are not requested for upload, instead that file is now linked to this deploy as well as the previous deploy where the file already existed. Now, if you send a API request for a deploy with only a single file in the request, Netlify would think your deploy needs only 1 file. So, all other files for that deploy would now throw a 404. Thus, any deploy in which you need to update even a single file need to specify all the files from the previous deploy along with their SHA1s as well as the new file with its SHA1. Doing this will help you upload just 1 file and the rest of the files would be simply “carried over”. You can now probably guess why we need to build your entire site. That’s because, we don’t know if you want only 1 file to be updated or 100 before we actually build. Only after the build, we are able to hash all the files and compare those. For all we know, a single update in your repo can change 100s of other files in the final build. It’s not possible for us to guess that before the build.

But, there’s a solution for you. If you deploy using Netlify CLI (or API), you don’t have to wait for the entire build to finish. You can run the build locally, and as mentioned above, send a single updated file. If you already have the list of files with their SHA1s available, you can entirely skip running the build and directly make a few API requests to upload the files.

1 Like