Dvc integration

Hello,

My site name is upscalerjs.com.

I’m using a tool called dvc as part of my build script (think Git LFS). DVC hosts files on Google Drive, for which it requires a service account JSON file.

Here is my build script:

pip install --upgrade pip && pip install dvc[all] && echo '>>>>>>>>' && GDRIVE_CREDENTIALS_DATA=$GDRIVE_CREDENTIALS_DATA dvc pull -v -r gdrive-service-account && npx pnpm i --store=node_modules/.pnpm-store && npx pnpm run build

The script fails at dvc pull. It seems that I am passing the $GDRIVE_CREDENTIALS_DATA incorrectly.

I’ve confirmed that I have a GDRIVE_CREDENTIALS_DATA environmental variable, and that the JSON string is encoded correctly (though I will note that, if I set the variable to foo and echo it out, I see foo; if I set the variable to the JSON string, it simply prints >>>>. I assume Netlify is doing some obfuscation here.)

For comparison, here is a similar implementation in GitHub Actions that works.

Relevant bits of the build log:

4:07:12 AM: >>>>>>>>
4:07:12 AM: 2023-02-28 09:07:12,409 DEBUG: v2.45.1 (pip), CPython 3.8.10 on Linux-5.4.228-131.415.amzn2.x86_64-x86_64-with-glibc2.29
4:07:12 AM: 2023-02-28 09:07:12,409 DEBUG: command: /opt/buildhome/python3.8/bin/dvc pull -v -r gdrive-service-account
4:07:12 AM: 2023-02-28 09:07:12,725 DEBUG: Preparing to transfer data from '1tGm1wnv7pAhbSuy4u9Ci61xA8WD3DXXr' to '/opt/build/repo/.dvc/cache'
4:07:12 AM: 2023-02-28 09:07:12,725 DEBUG: Preparing to collect status from '/opt/build/repo/.dvc/cache'
4:07:12 AM: 2023-02-28 09:07:12,725 DEBUG: Collecting status from '/opt/build/repo/.dvc/cache'
4:07:12 AM: 2023-02-28 09:07:12,726 DEBUG: Preparing to collect status from '1tGm1wnv7pAhbSuy4u9Ci61xA8WD3DXXr'
4:07:12 AM: 2023-02-28 09:07:12,726 DEBUG: Collecting status from '1tGm1wnv7pAhbSuy4u9Ci61xA8WD3DXXr'
4:07:12 AM: 2023-02-28 09:07:12,726 DEBUG: Querying 1 oids via object_exists
4:07:12 AM: 2023-02-28 09:07:12,875 ERROR: unexpected error - Failed to authenticate GDrive: ('Unexpected credentials type', None, 'Expected', 'service_account')
4:07:12 AM: Traceback (most recent call last):
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/output.py", line 1011, in get_dir_cache
4:07:12 AM:     ocheck(self.cache, obj)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/__init__.py", line 20, in check
4:07:12 AM:     odb.check(obj.oid, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/db/__init__.py", line 183, in check
4:07:12 AM:     _, actual = hash_file(obj.path, obj.fs, self.hash_name, self.state)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/hash.py", line 178, in hash_file
4:07:12 AM:     hash_value, meta = _hash_file(path, fs, name, callback=cb, info=info)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/hash.py", line 121, in _hash_file
4:07:12 AM:     info = info or fs.info(path)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 481, in info
4:07:12 AM:     return self.fs.info(path, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_objects/fs/local.py", line 42, in info
4:07:12 AM:     return self.fs.info(path)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/fsspec/implementations/local.py", line 87, in info
4:07:12 AM:     out = os.stat(path, follow_symlinks=False)
4:07:12 AM: FileNotFoundError: [Errno 2] No such file or directory: '/opt/build/repo/.dvc/cache/b3/3f1f6aa2be21da44cbbd212427c505.dir'
4:07:12 AM: During handling of the above exception, another exception occurred:
4:07:12 AM: Traceback (most recent call last):
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/fs/spec.py", line 70, in _wrap_errors
4:07:12 AM:     yield
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/fs/spec.py", line 150, in _service_auth
4:07:12 AM:     auth.ServiceAuth()
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/auth.py", line 100, in _decorated
4:07:12 AM:     decoratee(self, *args, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/auth.py", line 319, in ServiceAuth
4:07:12 AM:     ServiceAccountCredentials.from_json_keyfile_dict(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/oauth2client/service_account.py", line 251, in from_json_keyfile_dict
4:07:12 AM:     return cls._from_parsed_json_keyfile(keyfile_dict, scopes,
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/oauth2client/service_account.py", line 171, in _from_parsed_json_keyfile
4:07:12 AM:     raise ValueError('Unexpected credentials type', creds_type,
4:07:12 AM: ValueError: ('Unexpected credentials type', None, 'Expected', 'service_account')
4:07:12 AM: The above exception was the direct cause of the following exception:
4:07:12 AM: Traceback (most recent call last):
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/cli/__init__.py", line 210, in main
4:07:12 AM:     ret = cmd.do_run()
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/cli/command.py", line 26, in do_run
4:07:12 AM:     return self.run()
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/commands/data_sync.py", line 31, in run
4:07:12 AM:     stats = self.repo.pull(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 58, in wrapper
4:07:12 AM:     return f(repo, *args, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 34, in pull
4:07:12 AM:     processed_files_count = self.fetch(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 58, in wrapper
4:07:12 AM:     return f(repo, *args, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 86, in fetch
4:07:12 AM:     d, f = _fetch(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 142, in _fetch
4:07:12 AM:     used = repo.used_objs(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 476, in used_objs
4:07:12 AM:     for odb, objs in self.index.used_objs(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/repo/index.py", line 449, in used_objs
4:07:12 AM:     for odb, objs in stage.get_used_objs(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/stage/__init__.py", line 722, in get_used_objs
4:07:12 AM:     for odb, objs in out.get_used_objs(*args, **kwargs).items():
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/output.py", line 1100, in get_used_objs
4:07:12 AM:     obj = self._collect_used_dir_cache(**kwargs)
4:07:15 AM: Failed during stage 'building site': Build script returned non-zero exit code: 2 (https://ntl.fyi/exit-code-2)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/output.py", line 1033, in _collect_used_dir_cache
4:07:12 AM:     self.get_dir_cache(jobs=jobs, remote=remote)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/output.py", line 1015, in get_dir_cache
4:07:12 AM:     self.repo.cloud.pull([obj.hash_info], **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 181, in pull
4:07:12 AM:     return self.transfer(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 135, in transfer
4:07:12 AM:     return transfer(src_odb, dest_odb, objs, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
4:07:12 AM:     status = compare_status(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 189, in compare_status
4:07:12 AM:     src_exists, src_missing = status(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 134, in status
4:07:12 AM:     exists = hashes.intersection(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 55, in _indexed_dir_hashes
4:07:12 AM:     dir_exists.update(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/tqdm/std.py", line 1183, in __iter__
4:07:12 AM:     for obj in iterable:
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_objects/db.py", line 357, in list_oids_exists
4:07:12 AM:     in_remote = self.fs.exists(paths, batch_size=jobs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 332, in exists
4:07:12 AM:     if self.fs.async_impl:
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/funcy/objects.py", line 50, in __get__
4:07:12 AM:     return prop.__get__(instance, type)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/funcy/objects.py", line 28, in __get__
4:07:12 AM:     res = instance.__dict__[self.fget.__name__] = self.fget(instance)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/dvc_gdrive/__init__.py", line 105, in fs
4:07:12 AM:     return _GDriveFileSystem(self._path, **self._settings)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/fsspec/spec.py", line 76, in __call__
4:07:12 AM:     obj = super().__call__(*args, **kwargs)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/fs/spec.py", line 220, in __init__
4:07:12 AM:     google_auth = _service_auth(
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/fs/spec.py", line 150, in _service_auth
4:07:12 AM:     auth.ServiceAuth()
4:07:12 AM:   File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
4:07:12 AM:     self.gen.throw(type, value, traceback)
4:07:12 AM:   File "/opt/buildhome/python3.8/lib/python3.8/site-packages/pydrive2/fs/spec.py", line 76, in _wrap_errors
4:07:12 AM:     raise GDriveAuthError("Failed to authenticate GDrive") from exc
4:07:12 AM: pydrive2.fs.spec.GDriveAuthError: Failed to authenticate GDrive
4:07:12 AM: 2023-02-28 09:07:12,930 DEBUG: Version info for developers:
4:07:12 AM: DVC version: 2.45.1 (pip)
4:07:12 AM: -------------------------
4:07:12 AM: Platform: Python 3.8.10 on Linux-5.4.228-131.415.amzn2.x86_64-x86_64-with-glibc2.29
4:07:12 AM: Subprojects:
4:07:12 AM: 	dvc_data = 0.40.3
4:07:12 AM: 	dvc_objects = 0.19.3
4:07:12 AM: 	dvc_render = 0.2.0
4:07:12 AM: 	dvc_task = 0.1.11
4:07:12 AM: 	dvclive = 2.1.0
4:07:12 AM: 	scmrepo = 0.1.11
4:07:12 AM: Supports:
4:07:12 AM: 	azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
4:07:12 AM: 	gdrive (pydrive2 = 1.15.1),
4:07:12 AM: 	gs (gcsfs = 2023.1.0),
4:07:12 AM: 	hdfs (fsspec = 2023.1.0, pyarrow = 11.0.0),
4:07:12 AM: 	http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
4:07:12 AM: 	https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
4:07:12 AM: 	oss (ossfs = 2021.8.0),
4:07:12 AM: 	s3 (s3fs = 2023.1.0, boto3 = 1.24.59),
4:07:12 AM: 	ssh (sshfs = 2023.1.0),
4:07:12 AM: 	webdav (webdav4 = 0.9.8),
4:07:12 AM: 	webdavs (webdav4 = 0.9.8),
4:07:12 AM: 	webhdfs (fsspec = 2023.1.0)
4:07:12 AM: Cache types: <https://error.dvc.org/no-dvc-cache>
4:07:12 AM: Caches: local
4:07:12 AM: Remotes: gdrive, s3, gdrive
4:07:12 AM: Workspace directory: xfs on /dev/nvme0n1p1
4:07:12 AM: Repo: dvc, git
4:07:12 AM: Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
4:07:12 AM: 2023-02-28 09:07:12,932 DEBUG: Analytics is enabled.
4:07:12 AM: 2023-02-28 09:07:12,959 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp02lbkm15']'
4:07:12 AM: 2023-02-28 09:07:12,960 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp02lbkm15']'
4:07:13 AM: ​
4:07:13 AM:   "build.command" failed                                        
4:07:13 AM: ────────────────────────────────────────────────────────────────
4:07:13 AM: ​
4:07:13 AM:   Error message
4:07:13 AM:   Command failed with exit code 255: pip install --upgrade pip && pip install dvc[all] && echo '>>>>>>>>' && GDRIVE_CREDENTIALS_DATA=$GDRIVE_CREDENTIALS_DATA dvc pull -v -r gdrive-service-account && npx pnpm i --store=node_modules/.pnpm-store && npx pnpm run build (https://ntl.fyi/exit-code-255)
4:07:13 AM: ​
4:07:13 AM:   Error location
4:07:13 AM:   In Build command from Netlify app:
4:07:13 AM:   pip install --upgrade pip && pip install dvc[all] && echo '>>>>>>>>' && GDRIVE_CREDENTIALS_DATA=$GDRIVE_CREDENTIALS_DATA dvc pull -v -r gdrive-service-account && npx pnpm i --store=node_modules/.pnpm-store && npx pnpm run build
4:07:13 AM: ​
4:07:13 AM:   Resolved config
4:07:13 AM:   build:
4:07:13 AM:     base: /opt/build/repo/docs
4:07:13 AM:     command: pip install --upgrade pip && pip install dvc[all] && echo '>>>>>>>>' && GDRIVE_CREDENTIALS_DATA=$GDRIVE_CREDENTIALS_DATA dvc pull -v -r gdrive-service-account && npx pnpm i --store=node_modules/.pnpm-store && npx pnpm run build
4:07:13 AM:     commandOrigin: ui
4:07:13 AM:     environment:
4:07:13 AM:       - GDRIVE_CREDENTIALS_DATA
4:07:13 AM:       - NETLIFY_BUILD_DEBUG
4:07:13 AM:       - PYTHON_VERSION
4:07:13 AM:       - NPM_FLAGS
4:07:13 AM:     ignore: git diff --quiet $CACHED_COMMIT_REF $COMMIT_REF .
4:07:13 AM:     publish: /opt/build/repo/docs/build
4:07:13 AM:     publishOrigin: config
4:07:13 AM: Caching artifacts
4:07:13 AM: Started saving node modules
4:07:13 AM: Finished saving node modules
4:07:13 AM: Started saving build plugins
4:07:13 AM: Finished saving build plugins
4:07:13 AM: Started saving corepack cache
4:07:13 AM: Finished saving corepack cache
4:07:13 AM: Started saving pip cache
4:07:13 AM: Finished saving pip cache
4:07:13 AM: Started saving emacs cask dependencies
4:07:13 AM: Finished saving emacs cask dependencies
4:07:13 AM: Started saving maven dependencies
4:07:13 AM: Finished saving maven dependencies
4:07:13 AM: Started saving boot dependencies
4:07:13 AM: Finished saving boot dependencies
4:07:13 AM: Started saving rust rustup cache
4:07:13 AM: Finished saving rust rustup cache
4:07:13 AM: Started saving go dependencies
4:07:13 AM: Finished saving go dependencies
4:07:15 AM: Build failed due to a user error: Build script returned non-zero exit code: 2
4:07:15 AM: Failing build: Failed to build site
4:07:15 AM: Finished processing build request in 1m59.269s

Can someone provide more light on how sensitive environment variables are handled during builds and deploys?

I’ve tried printing my environmental variable to a local file, and then cat out that file. The output reads ****.

I’ve tried manually setting the environmental variable inline, e.g. GDRIVE_CREDENTIALS_DATA='<MY_CONTENTS>' but the command reads GDRIVE_CREDENTIALS_DATA='****' dvc pull.

I’ve set the Sensitive Variable Policy to be “Deploy without restrictions”. Doesn’t seem to make a difference.

Is there any way I can avoid obfuscating my environment variables?

Hey @theory,

We see something very different than what you mention here when we tried checking your site. Your current deploys seem to be failing at something else than dvc. Has something changed since you wrote in?

Sorry, I missed your reply and yes, I subsequently modified things. I’ve reverted back to the problem command.

Here’s my latest build script:

echo '>>>>>>>>' && pip install --upgrade pip && pip install dvc[all] && echo 'GDRIVE_CREDENTIALS_DATA' && echo $GDRIVE_CREDENTIALS_DATA && dvc pull -v -r gdrive-service-account && pnpm run build

And here is the relevant bits of the build log:

7:39:23 AM: GDRIVE_CREDENTIALS_DATA
7:39:23 AM: ****
7:39:23 AM: 2023-06-21 11:39:23,404 DEBUG: v3.1.0 (pip), CPython 3.8.10 on Linux-5.4.228-131.415.amzn2.x86_64-x86_64-with-glibc2.29
7:39:23 AM: 2023-06-21 11:39:23,404 DEBUG: command: /opt/buildhome/python3.8/bin/dvc pull -v -r gdrive-service-account
7:39:24 AM: 2023-06-21 11:39:24,065 ERROR: unexpected error - Failed to authenticate GDrive: ('Unexpected credentials type', None, 'Expected', 'service_account')

DVC requires that an environmental variable, GDRIVE_CREDENTIALS_DATA, be populated with the contents of a google drive service account JSON file.

I’ve tried to confirm that the environment variable is being set correctly, but it’s being printed out obfuscated so I can’t confirm.

Would appreciate any other tips or pointers. By comparison, here’s the corresponding GitHub action which calls dvc pull without issues and then builds the site: UpscalerJS/.github/workflows/tests.yml at main · thekevinscott/UpscalerJS · GitHub

You can try echoing the environment variable to a file and reading that file to confirm if the variable is correct. As far as I can see, your site has the variable, Netlify seems to be using it, so I’m not 100% convinced yet, this is something failing on our end.

I found a related issue: django - ('Unexpected credentials type', None, 'Expected', 'service_account') with oauth2client (Python) - Stack Overflow, maybe that applies to you?

I re-input the secret, and it now echoes out successfully. I’m not sure what changed - perhaps I was pasting **** as the secret previously?

My build is still failing but looks like an unrelated issue. It looks like the dvc portion is now working successfully.

Thanks for your help! I’ll mark this closed.

awesome thanks for coming back and sharing this with the community. Best of luck

Is this Docker image still valid? Docker

My build is still intermittently failing and I’m trying to troubleshoot why locally. I’m attempting to reproduce using this Docker image, but the repo reports it’s been made closed source.

The repo also states this:

To support your troubleshooting, we will still publish the build-image to Docker Hub for now. We would love to make troubleshooting easier for you, but first we need to understand your unique situation.

So yes, the image is valid.