Skipping CDN file diff

I deploy to Netlify using Netlify CLI. I have a site with many thousands (around 20k files) of small pages and the CDN diff is taking anywhere between 6 and 45 mins. It’s really variable.

I suspect the diff is way slower than just uploading the whole bundle each time.

Is there a way to skip the CDN diff and just always upload the entire bundle of files?

Looking at this in the Netlify UI, it appears that it might infact be taking ages in the upload step:

12:39:13 PM: Creating deploy upload records

In Github it looked like it was the diff step.

There are a lot of files, but gripped it’s only a few MBs. Unzipped it’s quite big though.

Does netlify CLI gzip the files before transfer?

What’s the recommendation for sites with very large numbers of small files so the deploy doesn’t take rediculously long?

File-diffing cannot be skipped. It’s one of the primitives of how Netlify works. We check our CDN or the SHA of all your files. For any of the SHAs that are already present on Netlify, we don’t upload that file.

So, this goes without saying, higher the number of files, the more time it will take for diffing as we need to then compare the SHA of each of those files to detect which file we actually need to upload and which already exists on the CDN.

No, we upload the files as they are, and apply any kind of available compressions while serving it.

No special recommendation - just make sure all kinds of post-processing is disabled in the site’s settings and form detection is disabled too.

Thanks for the reply.

The thing I’m most concerned about is that it is eating up all my minutes on Github actions.

Do you think it would make a difference if I did the deploy in a 2 step process:

  1. Create a gzipped tarball of the site in Github actions
  2. Deploy gzipped tarball with accompanying untar script
  3. Have Netlify run the untar script in its build env
  4. Start the app

Is that even possible? i.e. can I trigger a build on Netlify after deploying via the Netlify cli?

I’m not entirely sure if it’s the upload or the CDN diff that is taking ages. How can I determine that?

Does CDN diff happen after upload or does it happen in parallel, i.e. does it start as soon as the first file lands on storage? (No sense in creating a tarball deploy only to find that the CDN diff is what is taking ages)

I’d suggest that the mere act of tarring up the build and committing it to the repo will start a build; that build can then untar the files without using any additional features of Netlify nor GitHub :smiley:

However - we are still going to have to diff the files as Hrishikesh pointed out (we’ll do it in our system instead of GitHub’s). But maybe it would save you some money.

The better solution that will serve you best for the long term is to read and follow this guide about not changing that many files with each deploy (better for build minutes / GH actions minutes, better for your visitors’ site performance, better for you waiting around less :))

Thanks for the reply.

I’d suggest that the mere act of tarring up the build and committing it to the repo will start a build

I’m not sure I understand what you are saying.

The reason I asked was that the Netlify projects are not hooked up to Github, I am deploying all the files, which are built in Github actions, using Netlify-cli.

Can a build be triggered after deploying files to Netlify using Netlify-cli?

Hi, @mjgs. If you want to deploy to Netlify, all of the files must be checksummed for every deploy. That is a hard requirement and there are no exceptions. It is only a matter of choosing where to do so (our systems, your local system, GitHubs systems, the build system of a third-party CI/CD service, etc.).

About triggering builds, the short answer is that there are two ways to deploy:

  • a build & deploy in the Netlify build system
  • a manual deploy

The manual deploy process itself can be done in the following ways:

  • use the Netlify CLI tool (your current method)
  • “drag and drop” in the Netlify UI
  • direct API access (using the either custom code or the Netlify javascript client)

However, without exception, all methods listed above require that the SHA1 checksum of every single file in the deploy is calculated and sent to our API. There is no workaround to remove this requirement. The following is always true and will never change (based on my understanding of the design of our API and internal architecture):

  • All files must be checksummed for all deploys. Every file. Every deploy.

If you build & deploy at Netlify, the build system here calculates the checksums. With the CLI tool, it calculates the checksums (in your case on servers controlled by GitHub). For drag and drop, the client side javascript in the browser makes the checksums. With direct API access, the calculation of the checksum is again done by the client. (Note, the API also checksums the uploaded files to make sure they match.)

To summarize, there is no way to avoid calculating these checksums. You can choose where to do it but it must be done to deploy.

Thanks @luke

I’ve gone back to using a small test data set during development, so my immediate issue of using up all my build minutes is manageable again.

Just want to make sure I understand what you are saying, in the “a manual deploy” case you list, which is what I am doing via netlify-cli, there is no way to trigger a Netlify build after file upload but before deploy to the CDN, is that correct?

Hi, @mjgs. Correct, you cannot trigger builds within a manual deploy.

You can do a manual deploy and then, after it completes, trigger a new deploy what is a continuous deployment deploy with a build & deploy at Netlify. However, they are two separate deploys and the files must be checksummed in both. There is no escaping checking all file for all deploys.

I want to help you understand why this is required.

The API needs to know which files are the same and which files are different. It also must ensure that the uploaded files matches the local files. This is why all files are checksummed both before and after uploading.

Let’s say you site has 100 files. When you first deploy using a manual deploy this is the workflow:

  1. You build outside of Netlify.
  2. You checksum all the files to be deployed.
  3. You send the manifest of all filenames and checksums to the API.
  4. The API says “all files are new so send upload them all to the API”.
  5. You upload all the files to API.
  6. The API checksums the files again.
  7. If all checksums match (and there are no other errors) the deploy is successful.

Now, let’s say that you change five files and add five new files. The site now has 105 files and 10 of those have never been seen on this site before (10 because 5 are new and 5 are changed).

This is the workflow for the second deploy:

  1. You build outside of Netlify.
  2. You checksum all the files to be deployed.
  3. You send the manifest of all filenames and checksums to the API.
  4. The API says “only 10 files are new so only these 10 files to the API”.
  5. You upload the 5 new files and the 5 changed files the files to API.
  6. The API checksums the files upload again.
  7. If all checksums match (and there are no other errors) the deploy is successful.

This is why the checksums are required for every deploy. ​Please let us know if there are any other questions about why this requirement exists.

Thanks for the info @luke

I totally understand the need to checksum all the files during CDN upload. Makes sense.

I was hoping it would be possible to trigger a Netlify build after the file transfer, but before the CDN upload. That would have enabled the following workflow:

  1. Build in Github actions, outputting a gzipped tarball of the website
  2. Use Netlify-cli to upload the gzipped tarball to Netlify
  3. Trigger a small build script on Netlify that untars the website tarball
  4. Netlify diffs and uploads files to CDN

That would make the Github to Netlify deploy very quick because instead of transferring hundreds of Megabytes it would be just a few Megabytes.

That would ensure I don’t use up all my Github actions minutes.

Is anything like that possible?

Are there any plans to add compression to the Netlify-cli upload?

Seems like it would be an obvious way to speed up deploys for everyone.

Not a tarball, but we do support a plain zip for uploads using API. That probably won’t cut down on the transfer size, but it will send the entire zip to Netlify and then Netlify would have to diff it on its servers. I believe you can still save your GitHub Actions time in this manner.

Compression on the local device would still take time, so it’s a plus, minus situation.

Where would I find example code for an api deploy?

Hi, @mjgs. If you want to work with undocumented parts of the API, this support guide is the starting place:

https://answers.netlify.com/t/support-guide-understanding-and-using-netlifys-api/160

Please read that support guide above. If there are any questions after that, please let us know what steps you have tried so far and what the results were.

Thanks for sending the link.

I wasn’t aware that this part if the API was undocumented. Is deploy via api something other people are successfully doing currently?

Also from one of the docs you sent links to:

https://github.com/netlify/js-client#site-deployment

Support for site deployment has been removed from this package in version 7.0.0. You should consider using the deploy command of Netlify CLI.

Is site deployment via API still supported?

Hi, @mjgs. Yes, it is still supported. Every time someone does a drag and drop deploy, it is using this API.

Drag and drop is still a supported deployment method and it will continue to be. There are no plans at all to deprecate that feature or API endpoint.

A quick additional question about the diff.

So I was thinking maybe I’ll just upload the files that changed in the build, but it looks to me that no matter how I deploy all the existing files get deployed over the previous deploy, so what’s the point in the diff?

Is there a way to deploy just a few files?

I’m not sure how you got that @mjgs. In the entire conversation we’re having here, we’ve been telling you only the changed files get uploaded.

You need to send a list of all the files that you wish to upload, not the files itself. Once you send the list of files with their SHAs, we will request only the files that are missing on our CDN (as compared using SHAs).

Instead of going back-and-forth, it would be worth if you try it out yourself and experience it, so you can see it happen live.

Thanks for the reply. Sorry I wasn’t very clear in my last post.

Where I’m at with this is that I’m almost certainly not going to have any time to build any API stuff. The other thing that happened in the past couple of days is that the deploys jumped from 30-40mins to 3-4 mins, and I didn’t make any changes. It’s like everything was slowed down by a factor of 10 for about a week, then it’s ‘magically’ fine again. So perhaps it’s all moot, if the deploys continue to be in the 3-4 mins range.

In my previous post I was just saying that when I deploy using the cli, though I understand on your end you do a diff, from the end user perspective all existing files get replaced by the new set specified. So it’s like there is no point in a diff. I could not see a way, using the cli, to just deploy a few files, rather than the whole set.

If there is some way to configure things so I can upload just changed files via the cli, please let me know.

Hopefully I’m not coming across as argumentative, that’s definitely not intentional, it’s been quite a frustrating week, and just trying to describe the situation.

I’ve run out of build minutes now, so I’ll need to wait a few days to try anything new.

Hi, @mjgs. There is a false assumption being made here:

In my previous post I was just saying that when I deploy using the cli, though I understand on your end you do a diff, from the end user perspective all existing files get replaced by the new set specified. So it’s like there is no point in a diff. I could not see a way, using the cli, to just deploy a few files, rather than the whole set.

The CLI is already only sending a few changed files. This is already what happens. It is happening automatically and invisibly to you. However, that is what is happening.

  • The CLI sends the deployment API endpoint a list of all file paths and the SHA1 checksums for all files.
  • The API looks at all files in the deploy history for the site. It doesn’t just check the previous deploy. It checks all files in every deploy for this individual site.
  • The API then sends the CLI a list of the files where the SHA1 checksum has never been seen before.
  • The CLI then uploads only the files that have not been seen before for this site.

This means if a site has 1000 files but only 5 files are new or have changed, the API only requests those five files. The checksums prevent wasting time. They prevent the sending of the other 995 files which have been uploaded already.

However, this can only be done if you checksum all files for every deploy. This is why the checksumming is alway required. It is the only way for the API to know which files are new.

Without the checksums, then you would be forced to upload every file with every deploy. With the checksums, now only changed or new files are sent.