Okay! So I think I figured this out. This issue occurs when the target production branch for the deploy doesn’t have an existing copy of the base directory to be deployed to. In my case, I had created a new directory on a branch, switched my deploy profile to use that new directory as the build target root, then made a PR to integrate the new directory into my main branch (the production deploy branch).
All netlify builds failed until a “clear cache and retry” until I merged that new branch with the production branch. Then, later netlify runs all worked without clearing the cache.
I do think this is bug in netlify, tho - the missing directory case should be handled by the cache manager, where the cache manager should regard that as an invalid cache and clear the cache automatically, rather than failing the build. So my solution doesn’t fix the core issue of the builds all being broken when you are working on a branch that moves the base directory of the netlify build for the production branch.
@adamf, if you could share the failing deploy, I can take a look at our logs on our end to see the exact reason why it failed. But yea, a deploy shouldn’t fail before the build command even begins if the publish directory doesn’t exist so we’d like to get a better idea why it did. Thanks!