Please confirm repo clones are NOT shallow

Hi there :waving_hand:

I just wanted to document something interesting that I found regarding the blobless clones Netlify is using.

There are various doc frameworks, reading Markdonw files and outputing a docs website: Docusaurus, Astro Starlight, Nextra, MkDocs, Fumadocs, Rspress, VitePress.

These doc frameworks usually need to display a “last updated at / author” at the bottom of their docs pages. And it turns out implemention the feature to read from the Git history can be a major performance bottleneck in terms of build times. I’ve documented all this here: Docs sites - read “last commit date/author” efficiently from Git · Issue #216 · e18e/ecosystem-issues · GitHub

I’m the maintainer of Docusaurus. To improve the performance of reading the Git history, we are moving from thousands of individual `git log ` commands to a single `git log --name-status` command that reads everything at once ahead of time.

The problem with blobless clones is that the `git --name-status` command will be very slow on the first run, because apparently the command has to lazily download one at a time the missing blobs to output the result we want.

You can see this behavior while running:

git clone --filter=blob:none git@github.com:facebook/docusaurus.git docusaurus-blobless
cd docusaurus-blobless
git --no-pager log --name-status # Slow
git --no-pager log --name-status # Fast

Fortunately, Git (2.49+) has a `git backfill` command that downloads the missing blogs in batch, much faster than downloading them individually: Git - git-backfill Documentation

git clone --filter=blob:none git@github.com:facebook/docusaurus.git docusaurus-blobless
cd docusaurus-blobless
git backfill # Reasonably fast
git --no-pager log --name-status # Fast

Note that Netlify caches the result of lazily or explicitly backfilling the missing blogs, so all this only has an impact on new/fresh Netlify CI runs with a cold cache.

Using `git backfill` on Netlify works well for, and the impact has been quite significant for that first run. I’ve documented the results in depth here: feat(core): New siteConfig `future.experimental_vcs` API + `future.experimental_faster.gitEagerVcs` flag by slorber · Pull Request #11512 · facebook/docusaurus · GitHub

  • With git backfill(explicit/eager backfilling): ~7.5s

  • Without git backfill (lazy backfilling): ~90s


I believe it would be simpler if Netlify didn’t perform a blobless clone by default, because this puts the burden on us to document how to improve build times on Netlify now, and I’m not 100% sure the time saved is huge considering how long it takes to run `git backfill`

However, blobless clones may still present an advantage for power users that really want to optimize:

  • You can start with a blobless clone
  • You can start running some tasks in parallel with running `git backfill`

What I mean is that you do not need to wait for the blobs to be downloaded to run your userland code: it could run a bit earlier. The impact wouldn’t be massive, but still an interesting fact to be aware of.

This is what I implemented in this PR, and it seems to work fine: chore(ci): Improve Netlify cache + Run `git backfill` in parallel by slorber · Pull Request #11554 · facebook/docusaurus · GitHub

[context.production]

  command = "(echo 'Build packages start' && yarn build:packages && echo 'Build packages end') & (echo 'Git backfill start' && git backfill && echo 'Git backfill end' ) & wait && yarn build:website"

I mostly documented this behavior for myself and to keep a history, but I hope this information will be helpful to someone else! :waving_hand: