Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc.api.get_url(): ignores request to avoid source control and crashes #10608

Open
rgoya opened this issue Nov 4, 2024 · 0 comments
Open

dvc.api.get_url(): ignores request to avoid source control and crashes #10608

rgoya opened this issue Nov 4, 2024 · 0 comments

Comments

@rgoya
Copy link

rgoya commented Nov 4, 2024

Bug Report

Description

When calling

dvc.api.get_url(path,
                repo,
                config={"core": {"no_scm": True}}
                )

dvc still attempts to find a Git repository and raises an exception when it doesn't find it.

We have a mono repo with dvc repositories tracked by git on different paths. During interactive work and development, users interact with dvc and source control management. In some cases, tests and applications are required to run in an isolated environment that does not contain git information; the isolated environment contains all the required dvc configuration and internal files.

In such cases, we would like our code to access dvc information programatically with the API, e.g. using dvc.api.get_url() function to get the s3 path to the remote file. Given that the isolated environment no longer depends on git, but the .dvc/config file is kept and does not contain no_scm = True, we attempted to use the config parameter to request that no SCM be expected (by using config={"core": {"no_scm": True}}).

However, even though config={"core": {"no_scm": True}} is instructing dvc.api.get_url() to avoid checking for SCM, it still fails with:

dvc.scm.SCMError: /tmp/test_repo is not a git repository

Reproduce

  1. cd into folder under SCM. e.g. /path/to/test_repo
  2. dvc init --subdir
  3. dvc config core.remote s3
  4. dvc remote add -d s3 "s3://fake-bucket/path"
  5. touch test_file.txt
  6. dvc add test_file.txt
  7. Test with python:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/path/to/test_repo/")
print(url)

prints s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e
8. Move repository to untracked folder, mv /path/to/test_repo/ /tmp/
9. Test with python:

import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/")
print(url)

this raises SCMError: /tmp/test_repo is not a git repository
10. Try to avoid SCM check:

import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/", config={"core": {"no_scm": True}})
print(url)

this still raises SCMError: /tmp/test_repo is not a git repository

Expected

Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.

Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require git to do so.

Diagnosis and possible fix

During the execution of dvc.api.get_url(), there is a call to Repo.open() to which all provided parameters are passed; including config, as well as two fixed parameters subrepos=True and uninitialized=True.

Repo.repo() then has a call to _get_remote_config(url) which internally calls Repo(url), and this last call tries to find the SCM.

The call to _get_remote_config(url) ignores any parameters being considered by dvc.api.get_url(). Re-establishing these parameters (e.g. calling _get_remote_config(url, *args, **kwargs)) appears to fix the problem (submitting a PoC PR here #10609 ).

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.56.0 (conda)
---------------------------
Platform: Python 3.10.12 on Linux-6.2.0-1018-aws-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.16.7
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.8
Supports:
        gdrive (pydrive2 = 1.19.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.10.0, boto3 = 1.28.54),
        ssh (sshfs = 2023.4.1)
Config:
        Global: /home/username/.config/dvc
        System: /etc/xdg/dvc

Additional Information (if any):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant