I have not been able to find a definitive answer.
I have always assumed that Netlify performs a full-depth clone of repositories, and I have no evidence to suggest otherwise. But I would appreciate a confirmation of my assumption.
Thanks much.
I have not been able to find a definitive answer.
I have always assumed that Netlify performs a full-depth clone of repositories, and I have no evidence to suggest otherwise. But I would appreciate a confirmation of my assumption.
Thanks much.
hi @jmooring
Our clones are not shallow but we do perform a blobless clone which makes it even faster.
Hope this helps!
Thank you for the confirmation. Much appreciated.
Hi @gualter
I’m curious, what made you choose a blobless clone by default instead of a treeless clone?
Hi, @slorber. I will not have the time to track down all stakeholders in that decision to get their reasoning for why that decision was made. However, the blog post @gualter linked to above has clues. Quoting that blog post:
Quick Summary
There are three ways to reduce clone sizes for repositories hosted by GitHub.
git clone --filter=blob:none <url>creates a blobless clone. These clones download all reachable commits and trees while fetching blobs on-demand. These clones are best for developers and build environments that span multiple builds.git clone --filter=tree:0 <url>creates a treeless clone. These clones download all reachable commits while fetching trees and blobs on-demand. These clones are best for build environments where the repository will be deleted after a single build, but you still need access to commit history.git clone --depth=1 <url>creates a shallow clone. These clones truncate the commit history to reduce the clone size. This creates some unexpected behavior issues, limiting which Git commands are possible. These clones also put undue stress on later fetches, so they are strongly discouraged for developer use. They are helpful for some build environments where the repository will be deleted after a single build.
Above the shallow clone is strongly discouraged and the treeless clone is only recommended if the repo will be deleted after each build (which is not the case at Netlify).
At Netlify, the cloned repo is preserved and reused via the build cache for subsequent builds. Above it say that the blobless clone is “best for developers and build environments that span multiple builds” which is the case at Netlify.
To summarize, we are following the same recommendations made in the blog post above.
Hi there ![]()
I just wanted to document something interesting that I found regarding the blobless clones Netlify is using.
There are various doc frameworks, reading Markdonw files and outputing a docs website: Docusaurus, Astro Starlight, Nextra, MkDocs, Fumadocs, Rspress, VitePress.
These doc frameworks usually need to display a “last updated at / author” at the bottom of their docs pages. And it turns out implemention the feature to read from the Git history can be a major performance bottleneck in terms of build times. I’ve documented all this here: Docs sites - read “last commit date/author” efficiently from Git · Issue #216 · e18e/ecosystem-issues · GitHub
I’m the maintainer of Docusaurus. To improve the performance of reading the Git history, we are moving from thousands of individual `git log ` commands to a single `git log --name-status` command that reads everything at once ahead of time.
The problem with blobless clones is that the `git --name-status` command will be very slow on the first run, because apparently the command has to lazily download one at a time the missing blobs to output the result we want.
You can see this behavior while running:
git clone --filter=blob:none git@github.com:facebook/docusaurus.git docusaurus-blobless
cd docusaurus-blobless
git --no-pager log --name-status # Slow
git --no-pager log --name-status # Fast
Fortunately, Git (2.49+) has a `git backfill` command that downloads the missing blogs in batch, much faster than downloading them individually: Git - git-backfill Documentation
git clone --filter=blob:none git@github.com:facebook/docusaurus.git docusaurus-blobless
cd docusaurus-blobless
git backfill # Reasonably fast
git --no-pager log --name-status # Fast
Note that Netlify caches the result of lazily or explicitly backfilling the missing blogs, so all this only has an impact on new/fresh Netlify CI runs with a cold cache.
Using `git backfill` on Netlify works well for, and the impact has been quite significant for that first run. I’ve documented the results in depth here: feat(core): New siteConfig `future.experimental_vcs` API + `future.experimental_faster.gitEagerVcs` flag by slorber · Pull Request #11512 · facebook/docusaurus · GitHub
With git backfill(explicit/eager backfilling): ~7.5s
Without git backfill (lazy backfilling): ~90s
I believe it would be simpler if Netlify didn’t perform a blobless clone by default, because this puts the burden on us to document how to improve build times on Netlify now, and I’m not 100% sure the time saved is huge considering how long it takes to run `git backfill`
However, blobless clones may still present an advantage for power users that really want to optimize:
What I mean is that you do not need to wait for the blobs to be downloaded to run your userland code: it could run a bit earlier. The impact wouldn’t be massive, but still an interesting fact to be aware of.
This is what I implemented in this PR, and it seems to work fine: chore(ci): Improve Netlify cache + Run `git backfill` in parallel by slorber · Pull Request #11554 · facebook/docusaurus · GitHub
[context.production]
command = "(echo 'Build packages start' && yarn build:packages && echo 'Build packages end') & (echo 'Git backfill start' && git backfill && echo 'Git backfill end' ) & wait && yarn build:website"
I mostly documented this behavior for myself and to keep a history, but I hope this information will be helpful to someone else! ![]()