You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dvc still attempts to find a Git repository and raises an exception when it doesn't find it.
We have a mono repo with dvc repositories tracked by git on different paths. During interactive work and development, users interact with dvc and source control management. In some cases, tests and applications are required to run in an isolated environment that does not contain git information; the isolated environment contains all the required dvc configuration and internal files.
In such cases, we would like our code to access dvc information programatically with the API, e.g. using dvc.api.get_url() function to get the s3 path to the remote file. Given that the isolated environment no longer depends on git, but the .dvc/config file is kept and does not contain no_scm = True, we attempted to use the config parameter to request that no SCM be expected (by using config={"core": {"no_scm": True}}).
However, even though config={"core": {"no_scm": True}} is instructing dvc.api.get_url() to avoid checking for SCM, it still fails with:
dvc.scm.SCMError: /tmp/test_repo is not a git repository
prints s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e
8. Move repository to untracked folder, mv /path/to/test_repo/ /tmp/
9. Test with python:
this still raises SCMError: /tmp/test_repo is not a git repository
Expected
Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.
Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require git to do so.
Diagnosis and possible fix
During the execution of dvc.api.get_url(), there is a call to Repo.open() to which all provided parameters are passed; including config, as well as two fixed parameters subrepos=True and uninitialized=True.
Repo.repo() then has a call to _get_remote_config(url) which internally calls Repo(url), and this last call tries to find the SCM.
The call to _get_remote_config(url) ignores any parameters being considered by dvc.api.get_url(). Re-establishing these parameters (e.g. calling _get_remote_config(url, *args, **kwargs)) appears to fix the problem (submitting a PoC PR here #10609 ).
Bug Report
Description
When calling
dvc
still attempts to find a Git repository and raises an exception when it doesn't find it.We have a mono repo with
dvc
repositories tracked bygit
on different paths. During interactive work and development, users interact withdvc
and source control management. In some cases, tests and applications are required to run in an isolated environment that does not containgit
information; the isolated environment contains all the requireddvc
configuration and internal files.In such cases, we would like our code to access
dvc
information programatically with the API, e.g. usingdvc.api.get_url()
function to get thes3
path to the remote file. Given that the isolated environment no longer depends ongit
, but the.dvc/config
file is kept and does not containno_scm = True
, we attempted to use theconfig
parameter to request that no SCM be expected (by usingconfig={"core": {"no_scm": True}}
).However, even though
config={"core": {"no_scm": True}}
is instructingdvc.api.get_url()
to avoid checking for SCM, it still fails with:Reproduce
cd
into folder under SCM. e.g./path/to/test_repo
dvc init --subdir
dvc config core.remote s3
dvc remote add -d s3 "s3://fake-bucket/path"
touch test_file.txt
dvc add test_file.txt
prints
s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e
8. Move repository to untracked folder,
mv /path/to/test_repo/ /tmp/
9. Test with python:
this raises
SCMError: /tmp/test_repo is not a git repository
10. Try to avoid SCM check:
this still raises
SCMError: /tmp/test_repo is not a git repository
Expected
Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.
Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require
git
to do so.Diagnosis and possible fix
During the execution of
dvc.api.get_url()
, there is a call toRepo.open()
to which all provided parameters are passed; includingconfig
, as well as two fixed parameterssubrepos=True
anduninitialized=True
.Repo.repo()
then has a call to_get_remote_config(url)
which internally callsRepo(url)
, and this last call tries to find the SCM.The call to
_get_remote_config(url)
ignores any parameters being considered bydvc.api.get_url()
. Re-establishing these parameters (e.g. calling_get_remote_config(url, *args, **kwargs)
) appears to fix the problem (submitting a PoC PR here #10609 ).Environment information
Output of
dvc doctor
:Additional Information (if any):
The text was updated successfully, but these errors were encountered: