Hi there ![]()
I just wanted to document something interesting that I found regarding the blobless clones Netlify is using.
There are various doc frameworks, reading Markdonw files and outputing a docs website: Docusaurus, Astro Starlight, Nextra, MkDocs, Fumadocs, Rspress, VitePress.
These doc frameworks usually need to display a “last updated at / author” at the bottom of their docs pages. And it turns out implemention the feature to read from the Git history can be a major performance bottleneck in terms of build times. I’ve documented all this here: Docs sites - read “last commit date/author” efficiently from Git · Issue #216 · e18e/ecosystem-issues · GitHub
I’m the maintainer of Docusaurus. To improve the performance of reading the Git history, we are moving from thousands of individual `git log ` commands to a single `git log --name-status` command that reads everything at once ahead of time.
The problem with blobless clones is that the `git --name-status` command will be very slow on the first run, because apparently the command has to lazily download one at a time the missing blobs to output the result we want.
You can see this behavior while running:
git clone --filter=blob:none git@github.com:facebook/docusaurus.git docusaurus-blobless
cd docusaurus-blobless
git --no-pager log --name-status # Slow
git --no-pager log --name-status # Fast
Fortunately, Git (2.49+) has a `git backfill` command that downloads the missing blogs in batch, much faster than downloading them individually: Git - git-backfill Documentation
git clone --filter=blob:none git@github.com:facebook/docusaurus.git docusaurus-blobless
cd docusaurus-blobless
git backfill # Reasonably fast
git --no-pager log --name-status # Fast
Note that Netlify caches the result of lazily or explicitly backfilling the missing blogs, so all this only has an impact on new/fresh Netlify CI runs with a cold cache.
Using `git backfill` on Netlify works well for, and the impact has been quite significant for that first run. I’ve documented the results in depth here: feat(core): New siteConfig `future.experimental_vcs` API + `future.experimental_faster.gitEagerVcs` flag by slorber · Pull Request #11512 · facebook/docusaurus · GitHub
-
With
git backfill(explicit/eager backfilling): ~7.5s -
Without
git backfill(lazy backfilling): ~90s
I believe it would be simpler if Netlify didn’t perform a blobless clone by default, because this puts the burden on us to document how to improve build times on Netlify now, and I’m not 100% sure the time saved is huge considering how long it takes to run `git backfill`
However, blobless clones may still present an advantage for power users that really want to optimize:
- You can start with a blobless clone
- You can start running some tasks in parallel with running `git backfill`
What I mean is that you do not need to wait for the blobs to be downloaded to run your userland code: it could run a bit earlier. The impact wouldn’t be massive, but still an interesting fact to be aware of.
This is what I implemented in this PR, and it seems to work fine: chore(ci): Improve Netlify cache + Run `git backfill` in parallel by slorber · Pull Request #11554 · facebook/docusaurus · GitHub
[context.production]
command = "(echo 'Build packages start' && yarn build:packages && echo 'Build packages end') & (echo 'Git backfill start' && git backfill && echo 'Git backfill end' ) & wait && yarn build:website"
I mostly documented this behavior for myself and to keep a history, but I hope this information will be helpful to someone else! ![]()