Deploys are getting stuck with just a "Creating deploy upload records" message

Hey :wave:

I’m trying to deploy my website, and sometimes the build is getting hours stuck on the initial step, Creating deploy upload records. Can I get any clarification on why? I left it running yesterday for 8 hours and it didn’t finish, so I cancelled to try again, and it again got stuck. I got to release it once, as you can see in this log, but I can’t seem to make it work again

It is a simple create-react-app, which builds in 30 seconds, so it shouldn’t take that long to deploy…
This is how I’m releasing it: netlify deploy --dir=build --site="<site-id>", and that returns me:

Deploying to draft URL...
✔ Finished hashing 45 files
✔ CDN requesting 14 files
✔ Finished uploading 14 assets
◐ Waiting for deploy to go live... ›   Warning:
 ›   {
 ›      "name": "TimeoutError"
 ›   }
 ›
◑ Waiting for deploy to go live...TimeoutError: Promise timed out after 1200000 milliseconds
    at Timeout._onTimeout (~/.config/yarn/global/node_modules/netlify/node_modules/p-timeout/index.js:34:63)

Site url: https://foxyb-site.netlify.app (no production deployment yet, but the site name is foxyb-site)

Full log available at Netlify App
It just contains one line though, 7:55:20 PM: Creating deploy upload records

hi, this might be due to a missing update. Can you update your version of the CLI and see if that fixes things?

@perry I’m seeing a bunch of the same thing today from deploys initiated by the netlify js client, from github actions. they just hang forever and the only entry in the logs is Creating deploy upload records. Here’s one example:

Is there some sort of internal issue happening with API deploys?

hey @jaredh159,

I’ve checked with a few folks and we don’t see an immediate cause for the behaviour you are seeing.

I’ve raised the issue with another team that might have more insight into this, we’ll get a chance to talk to them tomorrow to see if they have any ideas. In the meantime, are you able to deploy another way?

Hi @jaredh159 and good news - team had time to look at it already!

As far as they can tell, you have written a custom API client - the user agent is github action. We at any rate did not write it and it appears to be “doing things wrong” - likely either using too high a concurrency (current limit = 5) or failing to respond appropriately to HTTP 429 rate limit responses (which are non-fatal for well behaved clients).

What happens when you try to deploy using our CLI instead?

@fool thanks for the reply!

I did not write my own client, i used the official netlify/js-client node package. but it it is run from a github action – which i’m guessing is why the user agent came through as github action. so i’m confused how it could be “doing things wrong” since it’s the official client put out by you guys.

however, I can tell you that i have a bunch of packages in a single monorepo, and there are often two deploys (sometimes more) running concurrently at the same time, all using the official node client from a github action. is that possibly causing rate-limit issues? but i would have thought that the client would have recovered gracefully from that scenario.

Do you know how the github action uses the library? I wouldn’t expect its user agent to leak out when it is making those API calls since we set it explicitly to something different:

We have 2 site names for dev and prod: chargedup-ui-servedup-order-dev & chargedup-ui-servedup-order-prod.

We use CircleCI to automatically deploy. The circle CI pipeline is timing out after 10 minutes since the Netlify build isn’t running. When we rerun the pipeline stage in CircleCI it runs successfully.

I have attached a screenshot showing the log line where the build is hangingScreenshot 2020-09-09 at 15.22.47

hi there, i have moved this post to this thread as it is the same error message.

Can you review the notes above and see what/if any apply to you? thanks!

We have a similar issue at the moment, however, we’re building through Gatsby Cloud and the issue resolves after about 4.5 minutes and deploys like normal. I’d imagine it’s quite likely they’re using the same library to make these calls, but it would have a different user-agent (presumably).

The builds aren’t big (1 minute for a full build), so for it to take 5 minutes to deploy is strange.

Gatsby cloud has been informed how to make their deploys more reliable; it’s up to them to implement the advice we gave. The advice was to undo the modifications in their use of our API client which should make things rather stable for their deploys.

Sadly, I don’t think the above is a help with our issue since we’re not using the netlify-cli, I’ve included our circleci config.yml file below:

   defaults: &defaults
  docker:
    - image: circleci/node:13
  working_directory: ~/repo

restore_cache: &restore_cache
  restore_cache:
    keys:
      - v1-dependencies-{{ checksum "package.json" }}
      - v1-dependencies-

version: 2.1
orbs:
  slack: circleci/slack@3.4.2

jobs:
  prepare:
    <<: *defaults
    steps:
      - checkout
      - <<: *restore_cache
      - run: yarn install
      - save_cache:
          paths:
            - node_modules
          key: v1-dependencies-{{ checksum "package.json" }}

  build-deploy-dev:
    <<: *defaults
    steps:
      - checkout
      - <<: *restore_cache
      - run: yarn build:dev
      - run: yarn deploy --site 91262ffb-3562-4e2e-994c-a59d3f399648 --auth $NETLIFY_ACCESS_TOKEN
      - slack/status

  build-deploy-prod:
    <<: *defaults
    steps:
      - checkout
      - <<: *restore_cache
      - run: yarn build:prod
      - run: yarn deploy --site e0f0c947-2260-439f-a0ba-da93edf2c786 --auth $NETLIFY_ACCESS_TOKEN
      - slack/status

workflows:
  version: 2
  build_and_deploy:
    jobs:
      - prepare:
          filters:
            branches:
              only: master
      - build-deploy-dev:
          context: NETLIFY
          requires:
            - prepare
      - hold:
          type: approval
          requires:
            - build-deploy-dev
      - build-deploy-prod:
          context: NETLIFY
          requires:
            - hold

Any help would be much appreciated. Interestingly, I just bumped the version down to 2 rather than 2.1 and then straight afterwards the 2 deploys I did straight after went through completely fine. But now its reverted back to failing again.

@fool i wrote the action, it’s pretty vanilla, it basically just calls client.deploy() with a bunch of options mostly passed in as environment variables from the github action workflow. you can see the source code here: friends-library/index.ts at master · friends-library-dev/friends-library · GitHub

I’m certainly not setting the user-agent myself. Maybe github automatically changes the user agent for all outbound http requests sent out of an action? I have no idea…

Can you tell me from looking at the code in my action if I’m doing anything wrong? Or what I could do to get my deploys to not hang in this manner?

Do you think it’s a rate-limiting issue? If i got several access tokens from Netlify and used one per site that deploys from my github actions, would that keep them from getting rate limited when more than one deploy is happening at once?

as always, really appreciate the help!

@fool @perry all my deploys from github actions (using the netlify node client) still failing today. is there any more you guys know about this issue? this same technique was working very reliably for months until yesterday, they all just keep hanging. i can do a manual deploy from my computer, but i’ve got my whole system built around ci/cd using github actions.

here’s another stuck deploy id: 5f5a5e5defce8f261a9e3941

Would love any help you guys could give. :pray:

(I also replied to @fool above with some more details about the deployment, and the a link to the source code).

I’m also getting these a LOT since some time. Was anything changed? I’m also uploading directly through the API and retrying all 429 (getting a lot of 429s). What should a well behaved client do other than retry?

And is it the correct behavior that the deploy just gets stuck at “uploading”?

Here is a thread with more info: API Uploads (files only): Deploys get stuck in state: "uploading"

Hey all! Just to let you know, today, this was discussed with our backend team to investigate further. Once we know more or have further Qs, we’ll get back!

Ok, please do keep us updated if you find out anything. The really odd thing is that for me it always runs successfully when we rerun the flow on Circle CI from the failed point. Feel free to check out my recent deploys to see if your BE team can spot any patterns.

Hi, @tech1. I noticed that the both the sites you mentioned are successfully deploying since about 8 am PDT on 2020-09-11.

Did you make some change do how you are deploying. We are investigating if this is being caused by a specific version of the netlify or netlify-cli NPM packages:

Did you update one of those to resolve this? If not, would you be willing to share what you are doing differently now?

Hi Folx,

We are also experiencing this hanging when using netlify-cli in our CI-CD across quite a few of our sites. We are not using Gatsby Cloud, just vanilla netlify-cli 2.54.0.

@luke still can’t get any deploy to go through from Github Actions, it’s been over a week since they started failing. Do you have any updates? Has your team found anything? Here’s another stuck deploy, if that helps… 5f6102ac6d3cd401468e63e0

Appreciate any help you can give!

To recap, I’m using the official node.js api client, deploying from a github action. This method had been working almost perfectly for a long time until early last week.